Edbert Puspito NBA Salary Predictionres.cloudinary.com/general-assembly-profiles/image/...Highest...

Post on 12-Jul-2020

2 views 0 download

transcript

Imagine

● You are Lakers GM● The team are it worst now, 16-65, last place in Western conference.● Kobe will retire, a bunch of player will have their contract expired.● You definitely need to rebuild, or L.A.’s top line from ticket sales and other

merchandise will drop hard.

Imagine

● The salary cap of 16-17 season is projected to be 89 Million.

● And the remaining contract totals in 26 Million.

● Which means you have 63 Million free beforeYou hit the salary cap.

Your challenge

● Who to re-sign?● Who to target at free agency?● How much should you pay them?

Data science to the rescue

● We created a model to predict a player salary based ONLY on their on-court performance.

● Find out who is overpaid, who is underpaid if you only consider their salary.● Find out which team’s GM is the best.

Hypothesis

● There are many factors that can affect the players’ salary1. Performance.2. Ability to attract fans.3. Market demands 4. Luck (hype)5. .etc

● We assume performance to be able to explain majority of their salary.

● 2,3 and even 4 are also tied to 1

Salary trivias (or not so)

● Highest Salary ever: 33M , MJ, 97-98 season.● The closest to the basketball god: 30 Million, KB24, 12-13, 13-14 season.● In 15-16 season, at least 10 players have salary > 20M● Average : ~5 M● Median: 2.5 M● Many player are “underpaid”, others “overpaid”

Dataset

● 4 seasons from 2012 to 2016.● Statistic from NBA.com, including player bio, basic and advanced stats.● Salaries were taken from ESPN.com, and adjusted for inflation.● Total of 1600 data and ~50 features.

Some graphs

Some graphs

Straight ball or Curve ball?

Base - linear : 0.596

Ridge - poly : 0.604

ElasticNet - poly : 0.581

Random Forest : 0.654

Extra trees : 0.667

Economic data

We added the data of official players twitter followers and the team ticket sales.

And the score goes up to 0.608 (0.699 in random forest regressors).

This did indicate that popularity did affect the players salaries, but we focus on performance (due to the small amount of popularity data that can be crawled)

So, what model we use?

● The forest of forest● 100 Extra trees models.● Each extra trees have min leaf of 3, depth of 12, 50 estimators.

And different sample of training data (70% sampled randomly)● Score can range from 0.64 to 0.68

So, what model we use?

● The forest of forest● 100 Extra trees models.● Each extra trees have min leaf of 3, depth of 12, 50 estimators.

And different sample of training data (70% sampled randomly)● Score can range from 0.64 to 0.68

Findings:

Findings:

Findings:

Findings:

As the “boss” of Lakers, Kobe indeed have all the means

to make his statistic beautiful.

And he is really really famous.

Findings:

Overrated? Maybe, as the model only

consider performance.

What surprising is, despite all the tickets

sales he “raised”, he is just overpaid by 1.5

M, suggesting he may be underpaid

performance wise.

Findings:

FYI, this guy is considered underrated in 15-16

season. As he is only paid 2M by Hornets.

Findings:

● Had breakthrough performance by successfully

defended Lebron @ 13-14 NBA Finals, got

Finals MVP + Championship ring.

● Saw more playing time and got defensive Player

of the year @ 14-15

● Contract resign @ 15-16, hence the jump in

salary. (and overrated-ness… lol)

Findings:

● A “nobody”and considered a “risky” move to

sign @ 12-13. (due to injury records)

● The rest are history.

● Contract will expire at end of 16-17 season,

expect a rocket jump.

Actionable insight

● Assuming their salary won’t

change much, Lakers can

sign those players.

● Maybe add some”overrated”

players that can mentor /

attract fans to games.

● Lakers have to pay me a data

science consultancy fee to

get the full result :)

Actionable insight

● Fire Nets’ GM / whoever made the signing

decision!

Let’s have some fun

Let’s have some fun

Challenges along the journey

● Feature engineering didn’t help much….● Can’t find feature to create● Not enough data● No economic feature

Future ideas

● Gather economic data, such as social media followers (facebook) and activity for every players, team ticket and jersey sales, and see if the additional data increase the models' score.

● Is 0.66 the ceiling of Salary prediction if just performance data is used?

● find out if MJ are really overrated/priced.

● Is there a way to create feature from the basics statistic data to improve the score?