Date post: | 11-Mar-2016 |
Category: |
Documents |
Upload: | cherlinca-boyd |
View: | 232 times |
Download: | 2 times |
A Graph Theoretic Analysis of the
National Basketball Association
By
Cherlinca Boyd
McNair Research Program
Summer 2012
Table of Contents
I. Abstract
II. Introduction
III. Literature Review
IV. Methodology
A. Limitations
B. Methods Used
V. Results
VI. Conclusion
VII. References
I. Abstract
A Graph Theoretic Analysis of the National Basketball Association
The web search engine Google is a relatively new tool for extracting information
from the web. Google returns searches of keywords and must rank the pages dis-
played in order of relevance. In this research, I used the same page-ranking algorithm
as Google does to analyze the results of the 2011-2012 National Basketball Associa-
tion (NBA) season. I then seeded the teams by these rankings into NBA playoff trees
to determine if the PageRanks of the teams could be used to predict success in the
playoffs as compared to the actual playoff results. As hoped, the PageRanks did
closely follow the actual playoff results and playoff seeding with just a couple of dis-
crepancies.
II. Introduction
The world is increasingly interconnected in many different ways. In fact, in the
influential text of Six Degrees by the well renowned author Duncan J. Watts, it was said, “in or-
der for this connected age to be understood, we must first understand how to describe it scientifi-
cally; that is, we need a science of networks” (14). A network in its most basic form models the
relationship or non-relationship between a set of objects. The objects are called vertices in the
network; a relationship is indicated by an edge between the two vertices. In Figure 1, I give a di-
rected network. Not only are there edges between vertices such as 18 and 20, but also the edge
has a direction from vertex 18 to vertex 20. This graph was obtained by asking the McNair
Scholars to list five acquaintances among the other Scholars. Although this network seems to be
very random, I will show that such networks contain a wealth of information. Indeed, this is why
companies seek to mine Facebook data and consider it such a valuable source of information.
Figure 1
Although this network is quite small, imagine a network in with vertices being the population of
the United States with over 311 million vertices and two vertices are joined by an edge precisely
when the two people have shared a handshake. According to the Theory of Six Degrees of
Separation almost all people are separated from the President of the United States by a chain of
at most six handshakes. It is this and other phenomenon that Social Network Scientists such as
Duncan Watts explore. Researchers such as social psychologists, computer scientists,
economists, mathematicians, biologists, and even teachers study these networks. Although a
network on 311 million vertices seems large, imagine the problem facing the search engine
company Google that must rank the relative importance of over 7.67 billion web pages. It is
natural to model the structure of the worldwide web by a network with a vertex for each web
page and two web pages joined by a directed edge precisely when one webpage links another.
Google uses the PageRank algorithm to accomplish this task.
In this research, I apply the Google PageRank method to rank the relative strength
of teams a part of the National Basketball Association (NBA) based on their regular season
performances. This ranking is then compared with the actual results of the teams in the playoffs
to measure its accuracy. Once the analyses of all the information in the data sets are complete, a
decision will be made to determine the efficiency of page ranking objects outside of web pages.
Professional sports, namely basketball, merely provide a useful data set for testing the efficiency
of the PageRank algorithm.
III. Literature Review
This research project’s concentration is in the area of Graph Theory. Graph Theory is
the study of network structure (Easley and Kleinberg 8). In Networks, Crowds, and Markets:
Reasoning about a Highly Connected World, Easley and Kleinberg said, “The social scientist
John Barnes once described graph theory as a terminological jungle, in which any newcomer
may plant a tree” (25). Since it has been established that Graph Theory is a “terminological jun-
gle,” resources such as the book Introduction to Graph Theory Fourth Edition by Dr. Robin J.
Wilson of Gresham College, UK, are very useful. It is especially useful to the individuals who
are not particularly familiar with the subject of Graph Theory. In fact, Dr. Wilson said he wrote
this text as an “introductory text suitable for both mathematicians taking courses in Graph The-
ory and also for non-specialists wishing to learn the subject as quickly as possible” (vii). This
book’s structure is fundamentally profound even down to the examples and sample problems and
their wide range of difficulty.
Reading texts in this format allow for easy comprehension of the terms I present
throughout this entire research. Just as in Networks, Crowds, and Markets: Reasoning about a
Highly Connected World, Dr. Wilson’s book dedicates an entire section to defining key terms.
Both literary works use an effective combination of words to make each and every definition as
simple as possible. So when reading the word degree, for example, it is automatically understood
that degree refers to the number of edges connected to distinctive vertices. When the basic termi-
nology is understood, more accelerated reading material can now be conquered.
Another concept in Graph Theory is the notion of the theory of Six Degrees of Sepa-
ration. This subject is so extensively studied that several graph theorists see fit to write books
solely about this topic. One book about this topic is Six Degrees: the Science of a Connected Age
by the great Duncan J. Watts. Simply, this theory proposes that everyone is separated by an aver-
age of 6 people. Although this theory is explicitly applied to people, it is also conjectured to be
true in social networks of dolphins, political blogs, and appearances of actors in plays and
movies. So, all the networks share a small-world phenomenon even though some of the networks
are quite large. This phenomenon suggests that interrelated objects, such as web pages may be
ranked in relation to each other in even the largest of networks by methods such as PageRank.
What is page ranking? The article PageRank, Spectral Graph Theory, and the Matrix
Tree Theorem defines page ranking as a voting system whereby the weight of each vote is lin-
early proportional to the total values of the votes it receives (Kenter 1). This concept is primarily
associated with the World Wide Web and web pages. Have you ever wondered how Google
seems to read your mind when providing links to a web search inquiry? The answer is through a
PageRank algorithm. The PageRank numbers associated with a web page is merely one number
in an eigenvector whose length is equal to the number of pages in the world-wide web. Here I
modeled the NBA season as a network with the teams substituting for web pages and statistical
measure for web links using the same PageRank algorithm.
IV. Methodology
A. Limitations
The 2011-2012 NBA season was irregular which brought about some discrepancies
during my data collection. A normal NBA season lasts a total of 82 games which does not
include the playoffs; in 2011-2012, the season began on December 25th with 16 fewer games
played. The reason was because of a lockout due to a disagreement between the players and
owners over the division of revenue and the structure of the league’s salary cap and luxury tax
during the summer of 2011.
The effects on the season outcomes of 16 fewer league games are pronounced. Outside of
the financial and physical burdens on the players and staffs, a shorter season provides fewer
results to analyze. This shortened season limitation affected the page ranking algorithm mainly
because some teams lost weight during the page ranking process because they lost to a weaker
team the one and only time they played each other during the season when in a normal 82-game
season, they may would have won the regular series match-up 3-1. Nonetheless, I created as
accurate results as possible from the winnings and losing of the regular season
B. Methods Used
Before beginning extensive research on calculating PageRank of the teams in the NBA, I
first had to acquire basic knowledge in Graph Theory. Developing the right skills was necessary
to manipulate and analyze graphs with the help of Mathematica, a powerful mathematical
computer program. Particularly, I analyzed various social network data sets available to
researchers at http://www-personal.umich.edu/~mejn/netdata/ to gain the experience. These data
sets are posted in the .gml network format which needed to be converted into a Mathematica
graph format. I then mastered the creation of adjacency matrices in Microsoft Excel and the
importation of these graphs into Mathematica. In doing so, I created a small social network using
survey data acquired from the 2012 McNair Scholars. I tested this research methodology by
determining various parameters of the network. I then proceeded to gather and, more
importantly, organize information about the results of the 2011-2012 NBA season.
Figure 2 below is a 30-by-30 matrix of zeros and ones that describes the results of the entire
2011-2012 NBA regular season. The rows and columns are labeled according to team names in
alphabetical order beginning with the Atlanta Hawks and ending with the Washington Wizards. If Atlanta
took the regular season series from Charlotte with 4 wins and 0 losses, then, in the block corresponding to
Atlanta horizontally and Charlotte vertically, I placed a 1. In the block corresponding to Charlotte
horizontally and Atlanta vertically, I placed a 0. So, series win are recorded horizontally and series losses
are recorded vertically. This simple model does not take into account whether a team won a series by a 3-
1 margin or by a 2-0 margin,
0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1 0 1 1 11 0 1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 1 10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 01 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 10 0 1 0 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 00 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 11 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 0 10 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 1 0 10 0 0 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 01 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 10 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 1
1 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 11 1 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 01 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 0 11 0 1 1 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 00 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 0 1 1 0 10 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 10 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 10 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 01 1 1 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 1 10 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 00 0 1 0 1 0 0 0 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 11 1 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 1 11 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 11 0 1 1 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 10 1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 1 11 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 0 1 10 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 10 0 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 10 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0
Figure 2
With the assistance of more detailed data obtained from the website www.nba.com/advancedstats,
I was able to construct a more precise record of the NBA season results in a second matrix. It recorded the
total number of games won and lost by a team in a series during the regular season. The sum horizontally
of a team’s entries equals its number of wins for the regular season. The sum vertically of a team’s entries
equals its number of losses for the regular season as illustrated in Figure 3. Just as Figure 2, the rows and
columns are aligned according to team name in alphabetical order, beginning with Atlanta Hawks and
ending with the Washington Wizards. So it is shown that Atlanta went 1-2 against the Boston Celtics with
a 1 in the block denoting Atlanta horizontally and Boston vertically and a 2 in the block denoting Boston
horizontally and Atlanta vertically. Records such as these could be read from any block in the matrix in
Figure 3 just by comparing rows and columns from one team to another.
0 1 4 1 3 1 0 3 0 0 2 1 0 0 1 2 1 4 1 1 1 3 0 1 1 1 0 3 1 32 0 3 1 2 0 0 1 1 1 2 1 0 1 3 3 1 3 0 2 0 3 1 0 1 0 0 2 1 40 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 2 0 03 3 3 0 3 1 0 4 0 0 2 1 1 1 2 4 1 2 2 3 0 2 2 1 0 2 1 3 1 20 1 3 0 0 1 1 2 0 1 1 1 0 0 0 0 1 2 0 2 1 0 0 1 0 1 0 1 0 10 2 1 0 0 0 3 1 2 3 0 1 0 1 0 1 1 0 2 1 1 1 1 3 2 2 2 1 3 11 1 1 1 0 1 0 1 1 3 1 1 1 0 1 2 3 1 2 1 1 2 1 3 1 3 1 1 1 11 2 4 0 2 0 0 0 0 0 1 0 1 0 0 1 0 2 1 0 0 2 1 0 1 2 0 2 0 21 0 0 1 1 1 2 1 0 1 0 2 0 0 1 0 2 0 1 1 0 0 0 2 1 3 0 0 1 11 0 2 1 0 0 1 1 2 0 0 0 2 2 0 0 1 1 2 1 2 0 1 2 3 3 2 1 1 21 2 3 1 3 1 0 2 2 1 0 1 1 0 1 3 2 3 2 1 1 1 2 0 1 0 0 3 1 31 0 1 0 0 2 3 1 2 3 0 0 1 2 1 1 1 1 2 0 3 1 1 1 3 3 1 1 2 2
1 2 1 0 1 4 3 0 4 1 0 2 0 2 1 0 4 1 3 1 1 0 0 2 2 1 1 1 2 01 0 1 1 1 2 3 2 4 2 1 1 1 0 1 1 4 1 3 1 1 1 1 1 1 2 0 1 1 13 1 3 2 3 2 0 3 0 1 3 0 1 0 0 1 1 3 1 3 1 2 4 1 1 1 1 3 0 11 0 2 0 4 0 0 3 1 1 0 0 1 0 2 0 1 3 0 2 0 0 1 0 2 0 1 3 0 30 0 2 0 0 2 1 2 1 3 0 3 0 0 0 0 0 1 1 0 0 0 1 1 2 2 2 0 1 10 0 4 1 2 1 0 1 2 0 1 1 0 0 0 0 0 0 0 1 0 0 2 1 0 1 0 2 0 20 1 1 0 1 1 2 0 2 1 0 1 0 1 0 1 2 1 0 1 0 1 0 1 0 1 0 0 2 02 2 3 1 2 1 0 3 0 0 2 1 1 0 0 2 1 2 0 0 0 2 2 0 1 2 0 2 1 30 2 1 1 0 3 2 1 3 2 0 1 2 3 1 1 3 1 3 1 0 2 1 3 3 3 1 1 2 00 0 3 1 3 0 0 1 1 1 3 0 1 0 2 4 1 3 0 1 0 0 3 1 1 1 0 3 0 33 2 3 1 3 0 0 3 2 0 2 0 1 0 0 2 0 2 1 1 0 1 0 1 0 1 0 2 1 31 1 1 0 1 1 0 1 1 2 1 3 2 2 0 2 2 0 3 1 0 0 0 0 2 3 0 0 2 11 0 1 1 1 1 2 0 3 0 0 1 1 2 0 0 1 1 4 0 1 0 1 1 0 2 1 1 0 10 1 1 0 0 1 0 0 1 0 1 0 2 1 0 1 2 0 3 0 1 0 0 0 2 0 1 1 2 11 1 1 0 2 2 2 1 3 2 1 2 2 4 0 0 1 1 4 1 2 2 2 4 2 2 0 1 3 11 2 1 0 3 0 1 1 1 1 0 0 0 1 0 0 1 2 1 2 0 0 1 1 0 0 0 0 1 20 0 1 0 2 1 2 1 3 2 0 1 2 2 1 1 2 2 1 0 1 1 1 1 4 2 1 0 0 10 0 4 1 2 0 0 1 0 0 0 0 1 0 2 1 0 1 1 0 1 1 1 0 1 0 0 2 0 0Figure 3
From Figures 1 and 2, I was able to use the program Mathematica to convert the matrices
into an actual graph. I imported both data sets into Mathematica and with the help of correct
coding; both data sets are visualized through the graph in Figure 4. The reason for generating the
graph is to be able to produce the type of information being sought such as the clustering
coefficient, centrality parameters, and most importantly, the PageRank of the vertices in the
graph. Once the graph was created, the next step was to retrieve and analyze this data.
Figure 4
V. Results
The PageRank Eigenvector was computed for the two matrices representing the results of
the 2011-2012 NBA season. Notice in Figure 5 how closely the PageRank numbers match the
number of wins of a team during this season. The San Antonio Spurs and the Chicago Bulls both
had the best results in the regular season as well as the lowest PageRank in the chart, here a low
PageRank indicates a more successful season. There are a few exceptions to this rule, for
example, Charlotte should be ranked behind Toronto, as Toronto had a poor season; however,
Charlotte had a terrible season only winning seven games to Toronto’s 23-game winning season.
The reason that, according to the PageRank algorithm, Charlotte’s ranked ahead of Toronto is
because Charlotte won the regular season series against Toronto. In result, Toronto passed its
PageRank onto Charlotte. While it seems that Toronto’s lost to Charlotte should not have had
that much effect causing a distortion in the ranking order, the fact that I was solely using the
PageRank is the primary difference maker.
Team Number of Sea-
son Wins
1s & 0sPage
Ranks
0s-4sPage Ranks
Spurs 50 0.017209 0.025764Bulls 50 0.017244 0.021713
Thunder 47 0.020648 0.023989Pacers 42 0.021035 0.025921
Grizzlies 41 0.021519 0.023724Hawks 40 0.022072 0.029178Magic 37 0.024535 0.033052
Mavericks 36 0.024903 0.032343Heat 46 0.026019 0.022937
Celtics 39 0.027487 0.02860276ers 35 0.028197 0.034521
Clippers 40 0.028346 0.030869Bucks 31 0.029013 0.03724Jazz 36 0.029557 0.03191
Rockets 34 0.030122 0.307971Nuggets 38 0.030217 0.024971Lakers 41 0.032171 0.003057Knicks 36 0.032957 0.040223Pistons 25 0.034247 0.038937
Sun 33 0.036191 0.033988Trailblazers 28 0.036978 0.035359
Timberwolves 26 0.039048 0.03729
Kings 22 0.041876 0.037175Wizards 20 0.042934 0.041298
Nets 22 0.043757 0.038805Cavaliers 21 0.047285 0.036784Warriors 23 0.050938 0.041093Hornets 21 0.052336 0.04166Bobcats 7 0.05484 0.04583Raptors 23 0.056321 0.043462
Figure 5
Figure 6 depicts the results of the actual NBA playoffs in the form of a tree. Note that Miami won
the championship even though their total number of wins was not as high as some other teams. The
seeding of the playoff teams depends on winning a team’s division or finishing second, as well two
wildcard teams are added from a conference to make 8 playoff teams per conference. The format is
elimination based on best out of seven game series.
Figure 6
Now, if the teams seeding were based on PageRanks and the advancements of the teams
were based on their results in head to head matchups during the regular season, Figure 7 gives a
visual description of what I project the season ending results to be using the 0s and 1s matrix.
Under this simulation, the Spurs are the NBA champions, while the Celtics finish runners up.
The results here match the actual playoff results very closely only having to substitute the Heat
for the Spurs and the Celtics for the Thunder in the NBA Finals to receive the actual results.
Both of these substitutions could easily have happened in real life. Crowd support and
productivity of back-up and star players could not be predicted and calculated within this
research. The corresponding series were very competitive.
In comparison to the 0s and 1s playoff tree, I also created the 0s-4s playoff tree. While
this tree does not mock the actual NBA playoff turn-out as well as the 0s and 1s tree does, up
until the conference final, there is great similarity as shown in Figure 8.
Figure 8
VII. Conclusions and suggestions for further study
The PageRank algorithm is a powerful tool that surprisingly applied well to the ranking
of NBA teams. The comparison of the results based on series wins versus actual wins proved that
the extra information obtained by considering the number of actual wins provided a less accurate
prediction of the final results; however, both data sets proved to be, not perfect, but great
indicators of success nonetheless. This suggests that other statistics could be used besides the
numbers of wins. For example, the total number of points scored in a season series, the total of
number of points allowed in a season series, and the number of rebounds, are just some of the
statistical measures that could be used for PageRank calculations merely by modifying the
matrices and performing every step thereafter. Additional seasons could be added to the analysis
or even to measure the effectiveness of individual performances throughout individual careers. In
summary, the NBA 2011-2012 regular season provided a useful data set for illustrating the
accuracy of the PageRank algorithm. I was surprised that the results of a National Basketball
season were so compactly represented by the numbers in an eigenvector.
VIII. References
1. Easley, David and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press, 2010. Print.
2. Langville, Amy N. and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006. Print
3. Wilson, R. J. Introduction to Graph Theory 4th Edition. Harlow: Longman, 1996. Print
4. Watts, Duncan. Six Degrees: The Science of a Connected Age. W. W. Norton & Company, 2003. Print
5. Kenter, Franklin. “PageRank, Spectral Graph Theory, and the Matrix Tree Theorem.” 2010. Print