+ All Categories
Home > Documents > A Graph Theoretic Analysis of the NBA

A Graph Theoretic Analysis of the NBA

Date post: 11-Mar-2016
Category:
Upload: cherlinca-boyd
View: 232 times
Download: 2 times
Share this document with a friend
Description:
The research paper was written by Cherlinca Boyd during her summer internship with the Ronald E. McNair program where she researched a graph theoretic analysis of the NBA.
Popular Tags:
24
A Graph Theoretic Analysis of the National Basketball Association By Cherlinca Boyd McNair Research Program Summer 2012
Transcript

A Graph Theoretic Analysis of the

National Basketball Association

By

Cherlinca Boyd

McNair Research Program

Summer 2012

Table of Contents

I. Abstract

II. Introduction

III. Literature Review

IV. Methodology

A. Limitations

B. Methods Used

V. Results

VI. Conclusion

VII. References

I. Abstract

A Graph Theoretic Analysis of the National Basketball Association

The web search engine Google is a relatively new tool for extracting information

from the web. Google returns searches of keywords and must rank the pages dis-

played in order of relevance. In this research, I used the same page-ranking algorithm

as Google does to analyze the results of the 2011-2012 National Basketball Associa-

tion (NBA) season. I then seeded the teams by these rankings into NBA playoff trees

to determine if the PageRanks of the teams could be used to predict success in the

playoffs as compared to the actual playoff results. As hoped, the PageRanks did

closely follow the actual playoff results and playoff seeding with just a couple of dis-

crepancies.

II. Introduction

The world is increasingly interconnected in many different ways. In fact, in the

influential text of Six Degrees by the well renowned author Duncan J. Watts, it was said, “in or-

der for this connected age to be understood, we must first understand how to describe it scientifi-

cally; that is, we need a science of networks” (14). A network in its most basic form models the

relationship or non-relationship between a set of objects. The objects are called vertices in the

network; a relationship is indicated by an edge between the two vertices. In Figure 1, I give a di-

rected network. Not only are there edges between vertices such as 18 and 20, but also the edge

has a direction from vertex 18 to vertex 20. This graph was obtained by asking the McNair

Scholars to list five acquaintances among the other Scholars. Although this network seems to be

very random, I will show that such networks contain a wealth of information. Indeed, this is why

companies seek to mine Facebook data and consider it such a valuable source of information.

Figure 1

Although this network is quite small, imagine a network in with vertices being the population of

the United States with over 311 million vertices and two vertices are joined by an edge precisely

when the two people have shared a handshake. According to the Theory of Six Degrees of

Separation almost all people are separated from the President of the United States by a chain of

at most six handshakes. It is this and other phenomenon that Social Network Scientists such as

Duncan Watts explore. Researchers such as social psychologists, computer scientists,

economists, mathematicians, biologists, and even teachers study these networks. Although a

network on 311 million vertices seems large, imagine the problem facing the search engine

company Google that must rank the relative importance of over 7.67 billion web pages. It is

natural to model the structure of the worldwide web by a network with a vertex for each web

page and two web pages joined by a directed edge precisely when one webpage links another.

Google uses the PageRank algorithm to accomplish this task.

In this research, I apply the Google PageRank method to rank the relative strength

of teams a part of the National Basketball Association (NBA) based on their regular season

performances. This ranking is then compared with the actual results of the teams in the playoffs

to measure its accuracy. Once the analyses of all the information in the data sets are complete, a

decision will be made to determine the efficiency of page ranking objects outside of web pages.

Professional sports, namely basketball, merely provide a useful data set for testing the efficiency

of the PageRank algorithm.

III. Literature Review

This research project’s concentration is in the area of Graph Theory. Graph Theory is

the study of network structure (Easley and Kleinberg 8). In Networks, Crowds, and Markets:

Reasoning about a Highly Connected World, Easley and Kleinberg said, “The social scientist

John Barnes once described graph theory as a terminological jungle, in which any newcomer

may plant a tree” (25). Since it has been established that Graph Theory is a “terminological jun-

gle,” resources such as the book Introduction to Graph Theory Fourth Edition by Dr. Robin J.

Wilson of Gresham College, UK, are very useful. It is especially useful to the individuals who

are not particularly familiar with the subject of Graph Theory. In fact, Dr. Wilson said he wrote

this text as an “introductory text suitable for both mathematicians taking courses in Graph The-

ory and also for non-specialists wishing to learn the subject as quickly as possible” (vii). This

book’s structure is fundamentally profound even down to the examples and sample problems and

their wide range of difficulty.

Reading texts in this format allow for easy comprehension of the terms I present

throughout this entire research. Just as in Networks, Crowds, and Markets: Reasoning about a

Highly Connected World, Dr. Wilson’s book dedicates an entire section to defining key terms.

Both literary works use an effective combination of words to make each and every definition as

simple as possible. So when reading the word degree, for example, it is automatically understood

that degree refers to the number of edges connected to distinctive vertices. When the basic termi-

nology is understood, more accelerated reading material can now be conquered.

Another concept in Graph Theory is the notion of the theory of Six Degrees of Sepa-

ration. This subject is so extensively studied that several graph theorists see fit to write books

solely about this topic. One book about this topic is Six Degrees: the Science of a Connected Age

by the great Duncan J. Watts. Simply, this theory proposes that everyone is separated by an aver-

age of 6 people. Although this theory is explicitly applied to people, it is also conjectured to be

true in social networks of dolphins, political blogs, and appearances of actors in plays and

movies. So, all the networks share a small-world phenomenon even though some of the networks

are quite large. This phenomenon suggests that interrelated objects, such as web pages may be

ranked in relation to each other in even the largest of networks by methods such as PageRank.

What is page ranking? The article PageRank, Spectral Graph Theory, and the Matrix

Tree Theorem defines page ranking as a voting system whereby the weight of each vote is lin-

early proportional to the total values of the votes it receives (Kenter 1). This concept is primarily

associated with the World Wide Web and web pages. Have you ever wondered how Google

seems to read your mind when providing links to a web search inquiry? The answer is through a

PageRank algorithm. The PageRank numbers associated with a web page is merely one number

in an eigenvector whose length is equal to the number of pages in the world-wide web. Here I

modeled the NBA season as a network with the teams substituting for web pages and statistical

measure for web links using the same PageRank algorithm.

IV. Methodology

A. Limitations

The 2011-2012 NBA season was irregular which brought about some discrepancies

during my data collection. A normal NBA season lasts a total of 82 games which does not

include the playoffs; in 2011-2012, the season began on December 25th with 16 fewer games

played. The reason was because of a lockout due to a disagreement between the players and

owners over the division of revenue and the structure of the league’s salary cap and luxury tax

during the summer of 2011.

The effects on the season outcomes of 16 fewer league games are pronounced. Outside of

the financial and physical burdens on the players and staffs, a shorter season provides fewer

results to analyze. This shortened season limitation affected the page ranking algorithm mainly

because some teams lost weight during the page ranking process because they lost to a weaker

team the one and only time they played each other during the season when in a normal 82-game

season, they may would have won the regular series match-up 3-1. Nonetheless, I created as

accurate results as possible from the winnings and losing of the regular season

B. Methods Used

Before beginning extensive research on calculating PageRank of the teams in the NBA, I

first had to acquire basic knowledge in Graph Theory. Developing the right skills was necessary

to manipulate and analyze graphs with the help of Mathematica, a powerful mathematical

computer program. Particularly, I analyzed various social network data sets available to

researchers at http://www-personal.umich.edu/~mejn/netdata/ to gain the experience. These data

sets are posted in the .gml network format which needed to be converted into a Mathematica

graph format. I then mastered the creation of adjacency matrices in Microsoft Excel and the

importation of these graphs into Mathematica. In doing so, I created a small social network using

survey data acquired from the 2012 McNair Scholars. I tested this research methodology by

determining various parameters of the network. I then proceeded to gather and, more

importantly, organize information about the results of the 2011-2012 NBA season.

Figure 2 below is a 30-by-30 matrix of zeros and ones that describes the results of the entire

2011-2012 NBA regular season. The rows and columns are labeled according to team names in

alphabetical order beginning with the Atlanta Hawks and ending with the Washington Wizards. If Atlanta

took the regular season series from Charlotte with 4 wins and 0 losses, then, in the block corresponding to

Atlanta horizontally and Charlotte vertically, I placed a 1. In the block corresponding to Charlotte

horizontally and Atlanta vertically, I placed a 0. So, series win are recorded horizontally and series losses

are recorded vertically. This simple model does not take into account whether a team won a series by a 3-

1 margin or by a 2-0 margin,

0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1 0 1 1 11 0 1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 1 10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 01 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 10 0 1 0 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 00 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 11 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 0 10 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 1 0 10 0 0 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 01 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 10 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 1

1 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 11 1 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 01 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 0 11 0 1 1 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 00 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 0 1 1 0 10 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 10 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 10 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 01 1 1 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 1 10 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 00 0 1 0 1 0 0 0 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 11 1 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 1 11 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 11 0 1 1 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 10 1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 1 11 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 0 1 10 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 10 0 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 10 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0

Figure 2

With the assistance of more detailed data obtained from the website www.nba.com/advancedstats,

I was able to construct a more precise record of the NBA season results in a second matrix. It recorded the

total number of games won and lost by a team in a series during the regular season. The sum horizontally

of a team’s entries equals its number of wins for the regular season. The sum vertically of a team’s entries

equals its number of losses for the regular season as illustrated in Figure 3. Just as Figure 2, the rows and

columns are aligned according to team name in alphabetical order, beginning with Atlanta Hawks and

ending with the Washington Wizards. So it is shown that Atlanta went 1-2 against the Boston Celtics with

a 1 in the block denoting Atlanta horizontally and Boston vertically and a 2 in the block denoting Boston

horizontally and Atlanta vertically. Records such as these could be read from any block in the matrix in

Figure 3 just by comparing rows and columns from one team to another.

0 1 4 1 3 1 0 3 0 0 2 1 0 0 1 2 1 4 1 1 1 3 0 1 1 1 0 3 1 32 0 3 1 2 0 0 1 1 1 2 1 0 1 3 3 1 3 0 2 0 3 1 0 1 0 0 2 1 40 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 2 0 03 3 3 0 3 1 0 4 0 0 2 1 1 1 2 4 1 2 2 3 0 2 2 1 0 2 1 3 1 20 1 3 0 0 1 1 2 0 1 1 1 0 0 0 0 1 2 0 2 1 0 0 1 0 1 0 1 0 10 2 1 0 0 0 3 1 2 3 0 1 0 1 0 1 1 0 2 1 1 1 1 3 2 2 2 1 3 11 1 1 1 0 1 0 1 1 3 1 1 1 0 1 2 3 1 2 1 1 2 1 3 1 3 1 1 1 11 2 4 0 2 0 0 0 0 0 1 0 1 0 0 1 0 2 1 0 0 2 1 0 1 2 0 2 0 21 0 0 1 1 1 2 1 0 1 0 2 0 0 1 0 2 0 1 1 0 0 0 2 1 3 0 0 1 11 0 2 1 0 0 1 1 2 0 0 0 2 2 0 0 1 1 2 1 2 0 1 2 3 3 2 1 1 21 2 3 1 3 1 0 2 2 1 0 1 1 0 1 3 2 3 2 1 1 1 2 0 1 0 0 3 1 31 0 1 0 0 2 3 1 2 3 0 0 1 2 1 1 1 1 2 0 3 1 1 1 3 3 1 1 2 2

1 2 1 0 1 4 3 0 4 1 0 2 0 2 1 0 4 1 3 1 1 0 0 2 2 1 1 1 2 01 0 1 1 1 2 3 2 4 2 1 1 1 0 1 1 4 1 3 1 1 1 1 1 1 2 0 1 1 13 1 3 2 3 2 0 3 0 1 3 0 1 0 0 1 1 3 1 3 1 2 4 1 1 1 1 3 0 11 0 2 0 4 0 0 3 1 1 0 0 1 0 2 0 1 3 0 2 0 0 1 0 2 0 1 3 0 30 0 2 0 0 2 1 2 1 3 0 3 0 0 0 0 0 1 1 0 0 0 1 1 2 2 2 0 1 10 0 4 1 2 1 0 1 2 0 1 1 0 0 0 0 0 0 0 1 0 0 2 1 0 1 0 2 0 20 1 1 0 1 1 2 0 2 1 0 1 0 1 0 1 2 1 0 1 0 1 0 1 0 1 0 0 2 02 2 3 1 2 1 0 3 0 0 2 1 1 0 0 2 1 2 0 0 0 2 2 0 1 2 0 2 1 30 2 1 1 0 3 2 1 3 2 0 1 2 3 1 1 3 1 3 1 0 2 1 3 3 3 1 1 2 00 0 3 1 3 0 0 1 1 1 3 0 1 0 2 4 1 3 0 1 0 0 3 1 1 1 0 3 0 33 2 3 1 3 0 0 3 2 0 2 0 1 0 0 2 0 2 1 1 0 1 0 1 0 1 0 2 1 31 1 1 0 1 1 0 1 1 2 1 3 2 2 0 2 2 0 3 1 0 0 0 0 2 3 0 0 2 11 0 1 1 1 1 2 0 3 0 0 1 1 2 0 0 1 1 4 0 1 0 1 1 0 2 1 1 0 10 1 1 0 0 1 0 0 1 0 1 0 2 1 0 1 2 0 3 0 1 0 0 0 2 0 1 1 2 11 1 1 0 2 2 2 1 3 2 1 2 2 4 0 0 1 1 4 1 2 2 2 4 2 2 0 1 3 11 2 1 0 3 0 1 1 1 1 0 0 0 1 0 0 1 2 1 2 0 0 1 1 0 0 0 0 1 20 0 1 0 2 1 2 1 3 2 0 1 2 2 1 1 2 2 1 0 1 1 1 1 4 2 1 0 0 10 0 4 1 2 0 0 1 0 0 0 0 1 0 2 1 0 1 1 0 1 1 1 0 1 0 0 2 0 0Figure 3

From Figures 1 and 2, I was able to use the program Mathematica to convert the matrices

into an actual graph. I imported both data sets into Mathematica and with the help of correct

coding; both data sets are visualized through the graph in Figure 4. The reason for generating the

graph is to be able to produce the type of information being sought such as the clustering

coefficient, centrality parameters, and most importantly, the PageRank of the vertices in the

graph. Once the graph was created, the next step was to retrieve and analyze this data.

Figure 4

V. Results

The PageRank Eigenvector was computed for the two matrices representing the results of

the 2011-2012 NBA season. Notice in Figure 5 how closely the PageRank numbers match the

number of wins of a team during this season. The San Antonio Spurs and the Chicago Bulls both

had the best results in the regular season as well as the lowest PageRank in the chart, here a low

PageRank indicates a more successful season. There are a few exceptions to this rule, for

example, Charlotte should be ranked behind Toronto, as Toronto had a poor season; however,

Charlotte had a terrible season only winning seven games to Toronto’s 23-game winning season.

The reason that, according to the PageRank algorithm, Charlotte’s ranked ahead of Toronto is

because Charlotte won the regular season series against Toronto. In result, Toronto passed its

PageRank onto Charlotte. While it seems that Toronto’s lost to Charlotte should not have had

that much effect causing a distortion in the ranking order, the fact that I was solely using the

PageRank is the primary difference maker.

Team Number of Sea-

son Wins

1s & 0sPage

Ranks

0s-4sPage Ranks

Spurs 50 0.017209 0.025764Bulls 50 0.017244 0.021713

Thunder 47 0.020648 0.023989Pacers 42 0.021035 0.025921

Grizzlies 41 0.021519 0.023724Hawks 40 0.022072 0.029178Magic 37 0.024535 0.033052

Mavericks 36 0.024903 0.032343Heat 46 0.026019 0.022937

Celtics 39 0.027487 0.02860276ers 35 0.028197 0.034521

Clippers 40 0.028346 0.030869Bucks 31 0.029013 0.03724Jazz 36 0.029557 0.03191

Rockets 34 0.030122 0.307971Nuggets 38 0.030217 0.024971Lakers 41 0.032171 0.003057Knicks 36 0.032957 0.040223Pistons 25 0.034247 0.038937

Sun 33 0.036191 0.033988Trailblazers 28 0.036978 0.035359

Timberwolves 26 0.039048 0.03729

Kings 22 0.041876 0.037175Wizards 20 0.042934 0.041298

Nets 22 0.043757 0.038805Cavaliers 21 0.047285 0.036784Warriors 23 0.050938 0.041093Hornets 21 0.052336 0.04166Bobcats 7 0.05484 0.04583Raptors 23 0.056321 0.043462

Figure 5

Figure 6 depicts the results of the actual NBA playoffs in the form of a tree. Note that Miami won

the championship even though their total number of wins was not as high as some other teams. The

seeding of the playoff teams depends on winning a team’s division or finishing second, as well two

wildcard teams are added from a conference to make 8 playoff teams per conference. The format is

elimination based on best out of seven game series.

Figure 6

Now, if the teams seeding were based on PageRanks and the advancements of the teams

were based on their results in head to head matchups during the regular season, Figure 7 gives a

visual description of what I project the season ending results to be using the 0s and 1s matrix.

Under this simulation, the Spurs are the NBA champions, while the Celtics finish runners up.

The results here match the actual playoff results very closely only having to substitute the Heat

for the Spurs and the Celtics for the Thunder in the NBA Finals to receive the actual results.

Both of these substitutions could easily have happened in real life. Crowd support and

productivity of back-up and star players could not be predicted and calculated within this

research. The corresponding series were very competitive.

In comparison to the 0s and 1s playoff tree, I also created the 0s-4s playoff tree. While

this tree does not mock the actual NBA playoff turn-out as well as the 0s and 1s tree does, up

until the conference final, there is great similarity as shown in Figure 8.

Figure 8

VII. Conclusions and suggestions for further study

The PageRank algorithm is a powerful tool that surprisingly applied well to the ranking

of NBA teams. The comparison of the results based on series wins versus actual wins proved that

the extra information obtained by considering the number of actual wins provided a less accurate

prediction of the final results; however, both data sets proved to be, not perfect, but great

indicators of success nonetheless. This suggests that other statistics could be used besides the

numbers of wins. For example, the total number of points scored in a season series, the total of

number of points allowed in a season series, and the number of rebounds, are just some of the

statistical measures that could be used for PageRank calculations merely by modifying the

matrices and performing every step thereafter. Additional seasons could be added to the analysis

or even to measure the effectiveness of individual performances throughout individual careers. In

summary, the NBA 2011-2012 regular season provided a useful data set for illustrating the

accuracy of the PageRank algorithm. I was surprised that the results of a National Basketball

season were so compactly represented by the numbers in an eigenvector.

VIII. References

1. Easley, David and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press, 2010. Print.

2. Langville, Amy N. and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006. Print

3. Wilson, R. J. Introduction to Graph Theory 4th Edition. Harlow: Longman, 1996. Print

4. Watts, Duncan. Six Degrees: The Science of a Connected Age. W. W. Norton & Company, 2003. Print

5. Kenter, Franklin. “PageRank, Spectral Graph Theory, and the Matrix Tree Theorem.” 2010. Print


Recommended