The Data Science Pipeline: Exploring Video Game Genres Over the Past 10 Years¶

1. Introduction¶

Hello and welcome!¶

This is a walkthough of the Data Science Pipeline, using Python as the language of choice. Before we begin, let's first talk about our dependencies:

This tutorial was created on a system running macOS.

Depending on your operating system, you may already have version of Python installed.

However, any operating system should work. For more information, visit https://www.python.org/downloads/operating-systems/

Jupyter & IPython¶

If you don't already know, this github page is actually a Jupyter (IPython) notebook that allows you to document both the input and output of code with boxes of text (known as "Markdown cells" - http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html#).

These cells support comment styling augmented with a lightweight and easy-to-use markup langugae (https://daringfireball.net/projects/markdown/syntax).

Although not technically required, using an application that supports interactive literate programming makes the entire pipeline process far easier to document, understand, and benefit from; using one is highly recommended (especially for beginners and newcomers).

You may download Jupyter through Anaconda: http://jupyter.readthedocs.io/en/latest/install.html

Python and IPython Versions¶

I am using Python version 3.6.2 and IPython version 6.1.0. You may verify your versions via

Help ---> About

Data¶

Once you have your environment set up, you can retrieve the data. For this walkthough we will use:

IGN's game review data for the last 20 years (up to 2016), found here: https://www.kaggle.com/egrinstein/20-years-of-games/

Video games with sales greater than 100,000 (from vgchartz.com), found here: https://www.kaggle.com/gregorut/videogamesales/

2. Getting Started - Looking at our data¶

We start by importing the necessary packages, reading the data in, and parsing it into a dataframe. Now is a good time to perform any other sanitization-related tasks before we begin to form perceptions and make assumptions about the data set. For example, we will search for missing entries (usually recorded as 'nan') and impute values (if necessary).

import pandas as pd
import seaborn as sns
from datetime import datetime
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm


ign = pd.read_csv('/home/jovyan/notebooks/Final Project/dkanney.github.io/data/ign.csv')
vgsales = pd.read_csv('/home/jovyan/notebooks/Final Project/dkanney.github.io/data/vgsales_ratings.csv')


# Restrict to games made from 2006-2016 inclusive
ign = ign[ign['release_year'] > 2005]
# display('ign: ', ign[ign['title'] == 'Sonic the Hedgehog'], 'len: ', len(ign), 'vgsales: ', vgsales.head(10), 'len: ', len(df2))


# Join the two dataframes based on video game title/Name, getting rid of anything that we don't have release date.
raw_data = pd.merge(ign, vgsales, left_on='title', right_on='Name', how='inner')


# Using release year, release month, and release day columns to create a new column for release date as a datetime object
for i, row in raw_data.iterrows():
    raw_data.loc[i, 'release_date'] = pd.to_datetime(str(row['release_year']) + '-' + str(row['release_month']) + '-' + str(row['release_day']))

    
# Getting rid of unsued/duplicate columns
del raw_data['Year_of_Release']
del raw_data['release_month']
del raw_data['release_year']
del raw_data['Critic_Count']
del raw_data['Critic_Score']
del raw_data['release_day']
del raw_data['Unnamed: 0']
del raw_data['User_Count']
del raw_data['User_Score']
del raw_data['Developer']
del raw_data['genre']
del raw_data['Name']
del raw_data['url']


# Get a list of all columns that aren't numerical/quantitative/datetime, and strip them of whitespace
for col in list(set(raw_data.columns.ravel()) - set(raw_data.describe().columns.ravel()) - set(['release_date'])):
    raw_data[col] = raw_data[col].str.strip()


display('raw_data: ', raw_data.sample(20), '# of rows: ', len(raw_data))

'raw_data: '

'# of rows: '

13646

Missing Data¶

Let's create a function in order to assist us with finding missing (nan) values in our dataframe:

# Check for NaNs in the dataframe, printing 'True' (or 'False') followed by the columns with 'nan'. 
# If the user specifies True for 'printrows', all entries containing 'nan' will be printed as well.
def get_nan(raw_data):  
    print(raw_data.isnull().sum())
    
    return raw_data[raw_data.isnull().T.any().T]

Running the next cell returns the remaining records with 'NaN' entries

get_nan(raw_data)
print('\nUnique Ratings: ', raw_data['Rating'].unique())

score_phrase         0
title                0
platform             0
score                0
editors_choice       0
Platform             0
Genre                0
Publisher            7
NA_Sales             0
EU_Sales             0
JP_Sales             0
Other_Sales          0
Global_Sales         0
Rating            1441
release_date         0
dtype: int64

Unique Ratings:  ['E' 'E10+' 'T' nan 'M' 'AO']

We now see that many of these rows contain missing/'NaN' values under 'Rating'.

3. Tidying our data¶

Missing Ratings¶

According to our function, our dataset contains more than 1400 missing game ratings. Obviously, we can't simply impute each one with 100% accuracy without examining each individual row.
Therefore, I decided to replace the missing game ratings with a pending rating of "RP" based on this rating guide from the ESRB: https://www.esrb.org/ratings/ratings_guide.aspx

raw_data['Rating'].fillna('RP', inplace=True)
missing_set = []

for i, row in get_nan(raw_data).iterrows():
    missing_set.append(i)
    
missing_set

score_phrase      0
title             0
platform          0
score             0
editors_choice    0
Platform          0
Genre             0
Publisher         7
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
Rating            0
release_date      0
dtype: int64

[2453, 2457, 9285, 10864, 10866, 11231, 13192]

Missing Publisher names¶

Let's start tidying the data by looking at the missing Publishers and filling in the missing data points (see specific entries above). Since there doesn't seem to be a clear pattern or reason for why this data is missing, we will assume for now that it is Missing Completely at Random (MCAR). Read more about the three types of missing data at

https://en.wikipedia.org/wiki/Missing_data#Techniques_of_dealing_with_missing_data

Due to the small amount of missing Publisher names, we can use this moment to learn a bit more about our data set while imputing the missing data.

# Publisher --> 'Sega'
# (SOURCE: https://en.wikipedia.org/wiki/Sonic_the_Hedgehog_(2006_video_game))
raw_data.loc[missing_set[0], 'Publisher'] = 'Sega'
raw_data.loc[missing_set[1], 'Publisher'] = 'Sega'


# Publisher --> 'Nintendo'
# (SOURCE: https://en.wikipedia.org/wiki/Mario_Tennis)
raw_data.loc[missing_set[2], 'Publisher'] = 'Nintendo'


# Publisher --> 'Wargaming'
# (SOURCE: https://en.wikipedia.org/wiki/World_of_Tanks)
raw_data.loc[missing_set[3], 'Publisher'] = 'Wargaming' # Initial release
raw_data.loc[missing_set[4], 'Publisher'] = 'Wargaming' # Console edition


# Publisher --> '7Sixty'
# (SOURCE: https://en.wikipedia.org/wiki/Stronghold_3)
raw_data.loc[missing_set[5], 'Publisher'] = '7Sixty'


# Publisher --> 'Gearbox'
# (SOURCE: http://store.steampowered.com/app/244160/Homeworld_Remastered_Collection/)
raw_data.loc[missing_set[6], 'Publisher'] = 'Gearbox'


# There appears to be Publishers with similar strings (i.e. 'Valve' and 'Valve Software'). 
# We can group these together to simplify analysis.
for i in raw_data[raw_data['Publisher'] == 'Valve Software'].index:
    raw_data.loc[i, 'Publisher'] = 'Valve'

Running the next cell verifies that there are no more remaining records with 'NaN' entries.

get_nan(raw_data);

score_phrase      0
title             0
platform          0
score             0
editors_choice    0
Platform          0
Genre             0
Publisher         0
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
Rating            0
release_date      0
dtype: int64

4. Exploratory Data Analysis - Plotting Our Data¶

Let's look at some visualizations of our data set. We can try several plots of differnt types in order to obtain a better understanding of our data set.

The Most Reviewed Genres (IGN)¶

Pie chart

display(ign['genre'].value_counts()[:15])

ign['genre'].value_counts()[:15].plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.15,0.12,0.15,0.18,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.18,0.15])
plt.title('Genres IGN Reviewed the Most')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

Action               2283
Shooter              1105
Sports                911
Adventure             886
RPG                   668
Strategy              579
Puzzle                557
Racing                528
Platformer            509
Action, Adventure     443
Simulation            354
Music                 324
Fighting              312
Action, RPG           243
Puzzle, Action        170
Name: genre, dtype: int64

Although this pie chart is helpful for visualizing our data, it's certainly not the only way we could visualize this data. Let's try a horizontal bar plot.

Bar plot

ign_rev = ign['genre'].value_counts()[:15].to_frame()

ax = sns.barplot(x='genre',y=ign_rev.index,data=ign_rev)
ax.set(xlabel='Total Reviews on IGN', ylabel='Genre')
plt.show()

This bar plot shows each genre's individual ranking much clearer.
Based on these plots, the three most-reviewed genres (by IGN) appear to be Action, Sports, and Shooter.
Let's continue exploring this data by plotting other potentially important features, such as sales.

The Genres with the Highest Sales in North America¶

Pie chart

NA_Sales = raw_data.groupby('Genre').sum()['NA_Sales'].copy().sort_values()[::-1]
display(NA_Sales)

NA_Sales.plot(kind='pie',shadow=True,autopct='%1.1f%%',explode=[0.15,0.12,0.15,0.18,0.2,0.2,0.2,0.2,0.2,0.2,0.18,0.15])
plt.title('Distribution of Sales in North America')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

Genre
Action          1622.26
Sports          1057.78
Shooter         1048.23
Misc             476.00
Platform         389.72
Role-Playing     332.78
Racing           297.36
Fighting         208.28
Puzzle           168.30
Simulation       139.92
Adventure         87.36
Strategy          53.36
Name: NA_Sales, dtype: float64

Scatter plot

NA_Sales = NA_Sales.to_frame()
ax = sns.stripplot(x=NA_Sales.index,y='NA_Sales',data=NA_Sales,size=15)
ax.set(xlabel='Genre', ylabel='North American Sales (million)')
plt.show()

Bar plot

ax = sns.barplot(x='NA_Sales',y=NA_Sales.index,data=NA_Sales)
ax.set(xlabel='North American Sales (million)', ylabel='Genre')
plt.show()

Let's take a look at how these genres performed globally.

The Genres with the Highest Sales Worldwide¶

Pie chart

world_sales = vgsales.groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]
display(world_sales)

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.15,0.12,0.15,0.18,0.2,0.2,0.2,0.2,0.2,0.2,0.25,0.3])
plt.title('Distribution of Global Video Game Sales')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

Genre
Action          1745.27
Sports          1332.00
Shooter         1052.94
Role-Playing     934.40
Platform         828.08
Misc             803.18
Racing           728.90
Fighting         447.48
Simulation       390.42
Puzzle           243.02
Adventure        237.69
Strategy         174.50
Name: Global_Sales, dtype: float64

Bar plot

world_sales = world_sales.to_frame().reset_index()

ax = sns.barplot(x='Global_Sales',y='Genre',data=world_sales)
ax.set(xlabel='Worldwide Sales (million)', ylabel='Genre')
plt.show()

In addition to being the top 3 most reviewed game genres, Action, Sports, and Shooter games seem to sell more than the other genres as well (Worldwide).

Is it possible that sales and scores are related? Or are they independent of each other?

We can create a visual "guide" this question by plotting the relationship between global video game sales and review scores.

We can model this relationship with a line of best fit (obtained via linear regression http://www.statsmodels.org/stable/regression.html).

Global Video Game Sales-Scores Relationship

s = raw_data.sample(6000)

ax = sns.stripplot(x='score',y='Global_Sales',hue='score_phrase',data=s, alpha=.5, size=5.5)
sns.regplot(x='score',y='Global_Sales',data=s,scatter=False)
ax.set(xlabel='IGN Review Score (0 - 10)', ylabel='Global Sales (Millions)')

plt.show()

Interesting! This plot is skewed left with much of its data residing more towards the right-hand side than the left-hand side. As a result, this scatter plot has a trend line that increases linearly, implying a positive linear relationship between the values plotted on the axis.

Our earlier observations suggested that sales for a video game's genre tend to do better when the genre has more reviews. However, each of these games has a rating from the ESRB that sets a lower bound the age of the buyer/consumer. This rating must have some effect on sales (and possibly reviews) since children and teens aren't lawfully allowed to purchase Rated-M games (not until they reach the set age threshold).

How well do these genres sell when instead considering ESRB ratings?

Let's take a look at how genres performed in each ESRB rating category.

Global Genre Sales by ESRB Rating¶

# Games grouped by their rating (E, E10+, T, M, AO, RP)
ratings = list(raw_data.Rating.unique())
raw_data.groupby('Rating').groups;

Pie chart

world_sales = raw_data[raw_data['Rating'] == ratings[0]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.15,0.12,0.15,0.18,0.2,0.2,0.2,0.2,0.2,0.25,0.3])
plt.title('Distribution of Global Video Game Sales (Rated '+str(ratings[0])+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

world_sales = raw_data[raw_data['Rating'] == ratings[1]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.1,0.1,0.12,0.15,0.18,0.18,0.18,0.18,0.15,0.15,0.12,0.3])
plt.title('Distribution of Global Video Game Sales (Rated '+ratings[1]+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

world_sales = raw_data[raw_data['Rating'] == ratings[2]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.15,0.12,0.15,0.18,0.2,0.2,0.2,0.2,0.2,0.2,0.25,0.3])
plt.title('Distribution of Global Video Game Sales (Rated '+ratings[2]+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

world_sales = raw_data[raw_data['Rating'] == ratings[3]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.1,0.12,0.15,0.18,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15])
plt.title('Distribution of Global Video Game Sales (Rated '+ratings[3]+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

world_sales = raw_data[raw_data['Rating'] == ratings[4]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[0.1,0.12,0.15,0.18,0.15,0.15,0.15,0.15,0.15,0.15,0.15])
plt.title('Distribution of Global Video Game Sales (Rated '+ratings[4]+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

world_sales = raw_data[raw_data['Rating'] == ratings[5]].groupby('Genre').sum()['Global_Sales'].copy().sort_values()[::-1]

world_sales.plot(kind='pie',autopct='%1.1f%%',explode=[0])
plt.title('Distribution of Global Video Game Sales (Rated '+ratings[5]+')')
fig = plt.gcf().set_size_inches(9,9)
plt.show()

These charts show us that, unlike the positive trend between sales and review scores, video game genres experience different sales numbers when considering video game ratings.

Global Video Game Sales by ESRB Rating

ax = sns.swarmplot(x='Rating',y='Global_Sales',hue='Genre',data=raw_data.sample(1000),size=6, order=['E','E10+','T','M','RP'])
ax.set(xlabel='ESRB Rating', ylabel='Global Sales (Millions)')
sns.violinplot
plt.show()

This swarmplot was used because we have a categorical x-axis, and many y-axis entries; this kind of plot prevents points from overlapping too much (as they may naturally do in a normal scatter plot). Since we'll be able to see our datapoints much more clearly, colors are assigned to the points in order to assist our analysis.

This plot shows a rough distribution of game genres among their respective ESRB ratings, as well as the game's global sales. We can clearly see the Sports/Racing/Platform games that are rated-E. As the age prerequisite required to purchase and play these games increases (ESRB rating), we can see a significant shift in the type of game (genre) players prefer to buy.

The next factorplot affirms our previous swarmplot. Each ESRB Rating has its own plot to make analyzing each infividual Rating's Genres much clearer.

Global Video Game Sales by Genre¶

sns.factorplot(x="Global_Sales", y="Genre",col="Rating",data=raw_data[raw_data['Rating'].isin(['E','E10+','T','M'])], kind="bar",size=10);
plt.show()

5. Making Predictions¶

After analyzing a dataset, one would want to know if they could test the results of their findings, using them to predict the results of future findings that fit the same model. In order to accomplish this, we can create a linear regression model using the OLS function in statsmodels.api (http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html)

After creating this model, we'll be able to predict (with a reasonable amount of error) how video game sales change worldwide based solely on sales from North America (NA_Sales), Europe (EU_Sales), and Japan (JP_Sales). Furthermore, we'd like to determine if IGN's scores are statically significant in the context of our model. The steps to make a simple linear regression model for Mature (Rated M) Action games are shown below:

data = raw_data[raw_data['Rating'] == 'M']
data = data[data['Genre'] == 'Action']
data.head()

OLS and Linear Regression Analysis¶

result = sm.ols(formula="Global_Sales ~ score + NA_Sales + EU_Sales + JP_Sales",  data=data).fit()
result.summary()

Hypothesis Testing¶

Based on the OLS model we created above, you can see the relationship between score, NA_Sales, EU_Sales, JP_Sales, and Global_Sales. Determining if the model fits out data well starts at the "R-squared" value being somewhat close to 1 (usually, this is not the only way to determine the model's fit, so a low R-squared does not necessarily mean something is wrong).

In our case, the R-sqaured value is high -- implying that our model explains most of the error

This means we can reject the null Hypothesis of: no correlation; There is clearly a correlation between score, NA_Sales, EU_Sales, JP_Sales, and Global_Sales.

As we examine the rest of the OLS Regression Results, we'll look at the coef and P>|t|, also known as the "coefficient" and the "p-value", respectively.

You can determine the correlation(positive/negative) on global sales by the parity of the coefficient term (+/-).

In order to see if the result is statistically significant, we must also examine the "F-statistic".

In general, increased Action video game sales in different areas (North America, Europe, Japan) boosts overall sales worldwide for the Action genres of games.

Game sales for teens and younger (ESRB rating E to T) are influenced more significatly other by genres; ESRB-Rated M games' sales rely much more heavily on Action titles

The score feature appears to have a small effect on global sales. Since its p-value is greater than 0.05, this means that our added score feature was not statistically significant in our model. Thus, you may conclude that score has minimal/no significant effect on global sales.

In conclusion, we can assume that for ESRB-Rated M Action games, most of the global sales can be attributed to simply three areas: North America, Europe, and Japan. We can also conclude that, although incorporating score into our model increased its R-squared value, its p-value shows it has a much smaller statistical signifcance in our model than our main 3 predictors.

6. Conclusion¶

By calculating these statistics of the regression model, we can get a better understanding of the quality of the model's fit. Previously, our plotted regression model was simply just that -- the line of best fit. However, much more occurs "behind the scenes" that makes analysis and prediction much more nuanced than simply plugging in data points and getting a perfect fit every time. Thus, an important part of the Data Science pipeline involves finding out which techniques work best with your dataset.

Thanks for reading!

	score_phrase	title	platform	score	editors_choice	Platform	Genre	Publisher	NA_Sales	EU_Sales	JP_Sales	Other_Sales	Global_Sales	Rating	release_date
611	Awful	Final Fight: Streetwise	PlayStation 2	3.6	N	PS2	Action	Capcom	0.03	0.02	0.00	0.01	0.05	M	2006-02-28
11297	Great	LEGO Harry Potter: Years 5-7	PC	8.0	N	PSP	Action	Warner Bros. Interactive Entertainment	0.10	0.10	0.00	0.06	0.26	E10+	2011-11-18
7837	Good	G-Force	Wii	7.3	N	DS	Action	Disney Interactive Studios	0.26	0.21	0.00	0.05	0.53	E10+	2009-07-21
13543	Great	TrackMania Turbo	PlayStation 4	8.4	N	XOne	Action	Ubisoft	0.02	0.04	0.00	0.00	0.06	E	2016-03-24
1961	Mediocre	Avatar: The Last Airbender	PlayStation 2	5.1	N	DS	Adventure	THQ	0.30	0.01	0.00	0.03	0.33	E	2006-10-16
214	Great	Machinarium	iPad	8.5	N	PC	Adventure	Daedalic	0.00	0.09	0.00	0.02	0.11	NaN	2011-09-14
6818	Bad	Six Flags Fun Park	Wii	4.5	N	Wii	Misc	Ubisoft	0.27	0.09	0.00	0.04	0.40	E10+	2009-04-14
4669	Painful	The History Channel: Battle for the Pacific	Wii	2.5	N	X360	Shooter	Activision	0.05	0.00	0.00	0.00	0.06	T	2008-01-03
9299	Great	Green Day: Rock Band	Xbox 360	8.3	N	X360	Misc	MTV Games	0.24	0.13	0.00	0.04	0.41	T	2010-06-08
3085	Good	Castlevania	NES	7.5	N	NES	Platform	Konami Digital Entertainment	0.54	0.06	0.62	0.01	1.23	NaN	2007-04-30
6841	Bad	Secret Service	Xbox 360	4.5	N	X360	Action	Activision Value	0.04	0.00	0.00	0.00	0.05	T	2008-12-08
3504	Good	Madden NFL 08	PlayStation 2	7.9	N	PS3	Sports	Electronic Arts	0.89	0.00	0.00	0.08	0.97	E	2007-08-14
2834	Good	Virtua Tennis 3	PlayStation 3	7.3	N	PS3	Sports	Sega	0.23	0.58	0.03	0.21	1.05	E	2007-03-21
7666	Great	Red Faction: Guerrilla	Xbox 360	8.0	N	PS3	Shooter	THQ	0.33	0.27	0.02	0.12	0.74	M	2009-05-28
7087	Great	Street Fighter IV	PC	8.9	Y	PC	Fighting	Capcom	0.07	0.05	0.00	0.02	0.13	T	2009-07-15
11931	Mediocre	Remember Me	Xbox 360	5.9	N	PC	Action	Capcom	0.00	0.05	0.00	0.00	0.06	M	2013-06-02
10529	Okay	Bust-A-Move Universe	Nintendo 3DS	6.0	N	3DS	Puzzle	Square Enix	0.08	0.15	0.06	0.03	0.31	E	2011-04-01
11155	Amazing	Rayman Origins	Wii	9.5	Y	PC	Platform	Ubisoft	0.00	0.06	0.00	0.01	0.08	E10+	2011-11-09
6335	Great	Tom Clancy's EndWar	PlayStation 3	8.0	N	X360	Strategy	Ubisoft	0.58	0.24	0.01	0.09	0.91	T	2008-11-04
5530	Good	Guitar Hero: Aerosmith	PlayStation 3	7.6	N	X360	Misc	Activision	1.00	0.13	0.00	0.10	1.23	T	2008-06-29

	score_phrase	title	platform	score	editors_choice	Platform	Genre	Publisher	NA_Sales	EU_Sales	JP_Sales	Other_Sales	Global_Sales	Rating	release_date
12	Mediocre	Way of the Samurai 4	PlayStation 3	5.5	N	PS3	Action	Nippon Ichi Software	0.00	0.03	0.16	0.01	0.19	M	2012-09-03
94	Good	Darksiders II	PC	7.5	N	PS3	Action	THQ	0.35	0.35	0.01	0.14	0.84	M	2012-08-17
95	Good	Darksiders II	PC	7.5	N	X360	Action	THQ	0.45	0.27	0.00	0.07	0.79	M	2012-08-17
97	Good	Darksiders II	PC	7.5	N	PC	Action	Nordic Games	0.03	0.10	0.00	0.02	0.15	M	2012-08-17
98	Good	Darksiders II	PC	7.5	N	WiiU	Action	THQ	0.07	0.07	0.00	0.01	0.15	M	2012-08-17

Dep. Variable:	Global_Sales	R-squared:	0.940
Model:	OLS	Adj. R-squared:	0.940
Method:	Least Squares	F-statistic:	4653.
Date:	Sat, 16 Dec 2017	Prob (F-statistic):	0.00
Time:	07:13:54	Log-Likelihood:	-896.94
No. Observations:	889	AIC:	1802.
Df Residuals:	885	BIC:	1821.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-0.3797	0.114	-3.329	0.001	-0.603	-0.156
score	0.0485	0.015	3.181	0.002	0.019	0.078
NA_Sales	1.8838	0.021	90.830	0.000	1.843	1.925
JP_Sales	3.0927	0.160	19.337	0.000	2.779	3.407

Omnibus:	556.165	Durbin-Watson:	2.292
Prob(Omnibus):	0.000	Jarque-Bera (JB):	16384.180
Skew:	2.338	Prob(JB):	0.00
Kurtosis:	23.505	Cond. No.	56.3