1 | P a g e
FIFA 17 ANALYSIS
WITH MICROSOFT
EXCEL
- HANSEL NICHOLAS D’SOUZA
2 | P a g e
INDEX
1. INTRODUCTION TO THE DATA……………………………………..3
2. SORTING PLAYERS BY COUNTRY & CLUB………………………4
3. ANALYSIS BY PLAYER AGE…………………………………………….6
4. ANALYSIS BY PLAYER RATINGS…………………………………….8
5. REFERENCES………………………………………………………………..13
3 | P a g e
Section 1 : Introduction to the Data
The data set that I have worked on to perform the following analysis is based on the
popular video game FIFA. The following extract best describes the product :
“FIFA, also known as FIFA Football or FIFA Soccer, is a series of association
football video games or football simulator, released annually by Electronic Arts under the EA
Sports label. Football video games such as Sensible Soccer, Kick Off and Match Day had been
developed since the late 1980s and already competitive in the games market when EA Sports
announced a football game as the next addition to their EA Sports label.” – Wikipedia
FIFA 17 is the 2017 edition of the global franchise FIFA, that mimics and simulates the
experience of playing and managing a professional soccer team. The game uses the
attributes, likeness and statistics of real soccer players.
Each column in the table represents an individual attribute of the players and each
record represents one of the 17,589 stock soccer players in the game. Each player has a
preferred position which can basically be classified into one of the four primary
positions: Forward, Midfielder, Defender, Goalkeeper.
Below are each of the columns of the data set and their type of measurement scale:
Name (Categorical) Reactions (Ratio) GK_Kicking (Ratio) Nationality (Categorical) Attacking_Position (Ratio) GK_Handling (Ratio) National_Position (Categorial) Interceptions (Ratio) GK_Reflexes (Ratio) National_Kit (Interval) Vision (Ratio) Club (Categorical) Composure (Ratio) Club_Position (Categorical) Crossing (Ratio) Club_Kit (Interval) Short_Pass (Ratio) Club_Joining (Interval) Long_Pass Ratio) Contract_Expiry (Interval) Acceleration (Ratio) Rating (Ratio) Speed (Ratio) Height (Ratio) Stamina (Ratio) Weight (Ratio) Strength (Ratio) Prefered_Foot (Ratio) Balance (Ratio) Birth_Date (Interval) Agility (Ratio) Age (Ratio) Jumping (Ratio) Prefered_Position (Categorical) Heading (Ratio) Work_Rate (Ordinal) Shot_Power (Ratio) Weak_foot (Ratio) Finishing (Ratio) Skill_Moves (Ratio) Long_Shot (Ratio) Ball_Control (Ratio) Curve (Ratio) Dribbling (Ratio) Freekick_Accuracy (Ratio) Marking (Ratio) Penalties (Ratio) Sliding_Tackle (Ratio) Volleys (Ratio) Standing_Tackle (Ratio) GK_Position (Ratio) Aggression (Ratio) GK_Diving (Ratio)
4 | P a g e
Section 2: Sorting Players by Club & Country
The first, and perhaps most obvious ways to groups the players is by the Club that they
play for. To obtain a general “ranking” of the clubs, I proceeded with obtaining the count
of the number of players playing for the club, their cumulative Player Ratings and then
finding the Average Player rating per club. The results are as below:
As observed above, Juventus lie in first place closely followed by FC Bayern in 2nd place
and then the other 8. Below is a column chart of the above data.
Next, I will proceed to rank the top 100 players in the game based on overall player
rating and segregate them by country. To view which countries have the most players in
the top 100 of the game, I created a stacked column as shown below of the top 10
countries, and as shown, Spain leads the pack with 16 players or 16% of the top 100
players.
5 | P a g e
Another way to segregate the data is by representing the total number of players on a
world map by Country such as below;
This map, an excel heat map, shows the distribution of players around the world. The
regions in green, such as Brazil, Argentina and Spain signify a dense population of
players and the regions in red a least.
6 | P a g e
Section 3: Analysis by Age
The next step in my analysis would involve sorting and analyzing the players data based
on their ages.
The graph below shows the distribution of players ages vs the density of players. As
observed by the scatter chart below, a vast majority of players, or approximately 8% of
the players are 25 years old.
Another variable would be to consider the age of which a player is at his peak. Suppose I
wanted to view the age at which players ‘peak’, or in FIFA terms, the age at which most
players have a higher overall rating.
7 | P a g e
The data as shown above in the clustered chart and the line chart, shows a higher
concentration of highly rated players between the ages of 24-32, below and above which
there is a noticeable drop in the overall player rating. This would imply that most of the
higher rating players lie between the ages of 24-32.
8 | P a g e
Section 4: Analysis by Ratings
Lets begin the analysis in this section by using heights of players as a parameter. Taking
all the heights and using a box and whisker plot, we get the following chart;
The very first thing we notice are the outliers. With the values of Q1, Q2 and Q3 of the
Box Plot being values being 176, 181 and 186 cms respectively, these values lying below
Q1 and above Q3, such as 207cm and 155cm constitute extreme values or outliers.
9 | P a g e
Another useful way of viewing the
general statistics of player data is to
run a descriptive statistic of the overall
rating of the players as shown to the
left.
We see that the mean of the 1,163,731
player ratings is approximately 66 and
so is the Median. The most frequent
rating is 67, while the range is between
94 and 45, or 49. The data has
negative skewness, which means the
data tends to tails off to the left and is
fairly symmetrical. The negative
kurtosis of -0.02 , signifies a
platykurtic or light tailed distribution.
To verify the Empirical rules, we first
calculate Mean +/- 3 S.D , which
equals 87.4 and 44.92 respectively.
Filtering the data based on this constraint yields 17,555 records out of a possible 17,588
records, or 99.81% of data. This verifies the empirical rule that 99.7% of the data lies
between the ranges stated above.
The Coefficient of Variation(CV) is calculated to be 0.107
Next up, I wanted to get a general view of the key stats of the top 10 players in the game.
This could be achieved easily by viewing a radar chart as shown below.
10 | P a g e
The chart above shows a general view of viewing multivariate data of each players key
stats on the scale of 1-100. It is pretty evident from the data that Goal keepers such as
Neuer, Courtois and De Gea have a much higher Goal keeper reflexes stat and
consequently lower numbers in the other departments, while speedsters such as Bale
and Neymar excel in the dribbling and Speed stats. As noticed, there are defenders in
the top 10, hence the lower Standing tackle numbers.
Another useful way of viewing data is by using the color scales, which shades the cells
based on the numerical value present using a color palette. As shown above, I have used
a red-yellow-green palette to visually represent how high a certain stat of a player is,
with red showing considerably low numbers while green displays higher stats. Players
show higher numbers in the stats that their playing position generally requires them to
specialize in. For example, defender Jerome Boateng displays a lot of the green shading
in the Defensive skills such as Sliding Tackle, standing tackle etc. and the same applies
to other players in their respective positions.
11 | P a g e
Speaking of position, another important way of grouping players is by the fundamental
primary position he plays in i.e, Goalkeeper, Defense, Midfield or Forward. Since each
player had a specialized position such as CAM(Central Attacking Midfield), LW(Left
Wing), RB(Right Back) etc. , a lookup table had to be created to categorize the positions
into one of the 4 primary positions.
The pie chart, as shown above demonstrates the overwhelming number of defenders
present in the game with a whopping 58% of the records. As expected, Goalkeepers only
consist of 6% of the data as every team consists of an average of 2-3 goalkeepers in a
squad of 32.
Another very sought after factor by soccer players, which provides another interesting
dimension of insight into the data set, is the Jersey Number. Most players seek to obtain
prestigious jersey numbers, and the following Area chart below shows the overall
distribution of data.
12 | P a g e
As observed above, the graph peaks at certain numbers, such as the ‘1’, ‘7’ and ‘8’ jerseys,
which signifies that most players prefer these Jersey numbers over the others.
We can finally also take a look at the correlation between individual attributes of the
players. A table of the following is displayed below:
As seen, there are some rather unsurprising findings here. For example, we see negative
correlations between Age VS Speed and Acceleration, showing that as a player gets
older, his overall pace decreases. Also, as Strength increases, Speed decreases showing
that the stronger players are generally slower.
The stronger positive correlations are between Ball Control VS Dribbling, showing that
having a higher Ball Control will lead to higher Dribbling stats, and Short Pass VS Long
Pass, showing that players excelling in the Long Pass tend to be equally good at the
Short Pass.
13 | P a g e
Section 5: References
Source of Data : kaggle.com
URL of Dataset : https://www.kaggle.com/artimous/complete-fifa-2017-player-dataset-global