Tuesday, July 31, 2012

NBA Draft Analysis Part I: How to value an NBA Draft Pick

A couple of weeks ago, I started to analyze the NBA draft, and came up with a general formula for the value of a draft pick. Since then, I've spent many long nights digging into the numbers. This is part one of my five-part NBA Draft Analysis, where I'm going more in-depth on my previous article to develop a means for analyzing the NBA drafts to compare teams, draft classes, and the GMs making the decisions.

The statistic I decided to use as a general-purpose comparison is PER*MP (Player Efficiency Rating)*(Minutes Played). This gives a heavy consideration for actually being on the floor, and a heavy consideration for contributing while they are playing. John Hollinger, creater of the complicated PER, has a similar statistic - called Value Added (VA). PER*MP and VA differ in that Value Added corrects for position. Value Added is an attempt to compare a players contributions as it compares to a replacement player. For a variety of reasons VA does not lend itself well to our analysis of the NBA draft, and we will stick with PER*MP. 

I calculated the PER*MP performance of every draft pick since 2002 (the last ten drafts) for every season. I then calculated the average performance for every pick, for every year since they were drafted and smoothed the line out in Excel. Basically, I found the PER*MP you should expect from a pick for every year after being drafted. For example, the 8th pick in the draft, in their 4th season since being drafted, is expected to post a PER*MP of 24154. If they average 20 minutes a game and play in 75 games, the PER would have to be 16.1 to reach the "expected" performance. This expected performance accounts for all the busts at the draft position - Rafael Araujo in 2004, for example - as well as the standouts like Rudy Gay in 2006.

The expected performance for each draft position increased until year three, then leveled out until year seven, then dropped down year eight. This behavior was much more evident early in the draft - the late picks peaked at year 3 as well, but with much smaller differences since very little is expected of a late draft pick. It should be noted that few picks reach the expected value late in the second round. Most of them post very low numbers, with a few players that go on to have good careers. The expected value is the smoothed-curve approximation to the average performance. Showing these trends on a plot was pretty hard in two dimensions, so I tried to get a decent 3D chart in excel.



Next, I had to decide how to weigh each year given the PER*MP every season. I took the expected values and figured out a formula to weigh the seasons to get a fair comparison. Obviously, good performance early is more valuable than good performance in three years.  For picks with 8 full seasons to compare, the weighing is as follows:

1st Year: 17.3%
2nd Year: 17.0%
3rd Year: 16.3%
4th Year: 15.3%
5th Year: 13.6%
6th Year: 11.2%
7th Year: 7.5%
8th Year: 1.8%

The picks that have fewer years are weighed in the same pattern, with the first year always the most valuable. For example a second-year player would have his first season weighed at just over 50%.

The 2002-2004 drafts have already had eight seasons to judge by, so these Modified Pick Values (MPVs) won't change going forward. For the more recent drafts, we don't have 8 years of performance to judge the picks on. These expected and actual MPVs will both change. Of course, this means that the expected MPV will be different for every draft after 2004, as you can see in the slideshow below.

To view any image in full resolution, click the gear in the top-right corner and select "view full resolution". Or, you can view the charts at picasa or imgur.



Since ranking things is probably the most-fun thing we can do with statistics, lets see what these numbers tell us. A more appropriate ranking system would probably be the raw value over the expected value - but with only a ten-year window, it would extremely favor the 2002-2004 drafts. So, instead we will rank them by percent of expected value - the best and worst value picks.

This is a very unfair ranking system, its just for fun. No one is suggesting that these are the best picks of the last decade - they are just the highest value picks at their spots. And since we are ranking players who have yet to finish more than a year or two, a lot of the results will be quite a bit ridiculous. So, with those qualifications, on to the rankings!

Here are the 25 top value picks since 2002 by percent of expected value:

  1. 2011 Isaiah Thomas 60 3844.80% 
  2. 2007 Ramon Sessions 56 986.01% 
  3. 2006 Paul Millsap 47 811.82% 
  4. 2007 Marc Gasol 48 798.47% 
  5. 2003 Mo Williams 47 664.00% 
  6. 2005 Ryan Gomes 50 651.34% 
  7. 2003 Kyle Korver 51 638.68% 
  8. 2009 Marcus Thornton 43 607.14% 
  9. 2005 Monta Ellis 40 572.20% 
  10. 2002 Carlos Boozer 34 532.45% 
  11. 2010  Landry Fields 39 525.20% 
  12. 2011 Chandler Parsons 38 520.83% 
  13. 2005 Andray Blatche 49 477.58% 
  14. 2005 Amir Johnson 56 470.19% 
  15. 2005 David Lee 30 457.89% 
  16. 2005 Louis Williams 45 454.08% 
  17. 2002 Rasual Butler 52 446.74% 
  18. 2009 Chase Budinger 44 446.35% 
  19. 2005 Marcin Gortat 57 440.77% 
  20. 2009 DeJuan Blair 37 401.45% 
  21. 2008 Goran Dragic 45 379.82% 
  22. 2004 Trevor Ariza 43 86166 370.50% 
  23. 2003 Josh Howard 29 148134 370.09% 
  24. 2003 Zaza Pachulia 42 86890 361.49% 
  25. 2011 Lavoy Allen 50 5598 345.41% 


All of these top value picks are in the later part of the draft due to the lower expected values. LeBron would have had to be drafted 6th to take the 25th spot overall! Here are the 25 best value picks in the top 10:


  1. 2004 Andre Iguodala 9 237.22% 
  2. 2003 Dwyane Wade 5 229.23% 
  3. 2002 Amare Stoudemire 9 221.43% 
  4. 2005 Chris Paul 4 221.12% 
  5. 2008 Brook Lopez 10 203.72% 
  6. 2002 Caron Butler 10 201.58% 
  7. 2010 Greg Monroe 7 199.31% 
  8. 2003 LeBron James 1 189.96% 
  9. 2009 Brandon Jennings 10 189.94% 
  10. 2003 Chris Bosh 4 188.13% 
  11. 2006 Rudy Gay 8 187.23% 
  12. 2007 Kevin Durant 2 174.80% 
  13. 2008 Russell Westbrook 4 174.44% 
  14. 2008 Kevin Love 5 170.17% 
  15. 2003 Carmelo Anthony 3 164.88% 
  16. 2006 Brandon Roy 6 159.71% 
  17. 2004 Luol Deng 7 159.57% 
  18. 2011 Kemba Walker 9 159.47% 
  19. 2009 Stephen Curry 7 156.05% 
  20. 2003 Kirk Hinrich 7 154.93% 
  21. 2010 DeMarcus Cousins 5 153.99% 
  22. 2005 Deron Williams 3 151.39% 
  23. 2007 Joakim Noah 9 144.75% 
  24. 2004 Dwight Howard 1 143.88% 
  25. 2009 DeMar DeRozan 9 140.63% 


This ranking shows the importance of not getting injured to be considered a valuable draft pick, with players like Blake Griffin, who would be over 150% of his expected value if he could be judged for only two seasons (but has to be judged for three since he sat a season after being drafted by the Clippers). And finally, the worst top 10 picks of all time:


  1. 2011 Jonas Valanciunas 5 0.00% 
  2. 2006 Mouhamed Sene 10 3.21% 
  3. 2006 Patrick O'Bryant 9 4.89% 
  4. 2004 Luke Jackson 10 5.36% 
  5. 2002 Nikoloz Tskitishvili 5 6.24% 
  6. 2004 Rafael Araujo 8 7.51% 
  7. 2008 Joe Alexander 8 8.39% 
  8. 2002 Jay Williams 2 11.28% 
  9. 2009 Hasheem Thabeet 2 13.65% 
  10. 2006 Adam Morrison 3 13.85% 
  11. 2002 Dajuan Wagner 6 14.65% 
  12. 2007 Greg Oden 1 16.84% 
  13. 2009 Ricky Rubio 5 24.10% 
  14. 2003 Darko Milicic 2 31.04% 
  15. 2005 Ike Diogu 9 33.04% 
  16. 2007 Brandan Wright 8 36.07% 
  17. 2004 Shaun Livingston 4 36.33% 
  18. 2006 Shelden Williams 5 41.80% 
  19. 2003 Mike Sweetney 9 42.35% 
  20. 2009 Jordan Hill 8 48.57% 
  21. 2011 Enes Kanter 3 48.74% 
  22. 2007 Yi Jianlian 6 54.47% 
  23. 2009 Jonny Flynn 6 55.80% 
  24. 2010 Ekpe Udoh 6 56.76% 
  25. 2005 Martell Webster 6 59.74% 

Of course, #1 in this ranking, Jonas Valanciunas, will actually be playing next year, so it isn't very fair to call him the worst pick ever. But generally top 10 picks are drafted to play right away. He will still easily be able to exceed his Expected Value with a nice career though.

A huge thanks to the fine folks at basketball-reference.com for their wonderful stats database, I got all of the raw data from their site. Also, Grantland.com's Bill Barnwell probably inspired this series (and maybe this website) with his NFL draft analysis. And finally, 82games.com did something similar to this a few years ago but did it the easy way, which didn't do the job justice.



This was Part I of the StatDance.com NBA draft analysis.
Part III: Team-by-team NBA draft performance - Coming Soon
Part IV: We evaluate every NBA GM since 2002 - Coming Soon
Part V: Who did they miss? Looking at the undrafted free agents in the NBA - Coming Soon





Wednesday, July 25, 2012

How Close is Tiger Woods to Actually Passing Jack Nicklaus - We Rank Golf's Greats

Tiger Woods, with his 14 majors has been chasing the iconic Jack Nicklaus for quite some time. Only four back from Jack's 18 major wins, the feat seems so close to achievement, if only Tiger can get back to his former level. This record is what Tiger strives for, the only thing left he has yet to conquer in competitive golf.

Did you realize that Jack Nicklaus took second place at Majors 19 times? So Jack won 18, placed second 19, and third an amazing 9 times. Tiger Woods has only 6 such second place finishes in Majors, and 4 third-place finishes.

I tried to develop a fair way of measuring a golfers career. This lead me to do a lot of research on golf, a sport I have not spent much time following. The major championships started 1860, with The Open Championship in Scotland. Over the course of professional golf's history, much has changed. Today, one could use career earnings - but I sincerely doubt that the purses have stayed constant with inflation. In older times, the majority of the "tour" was exhibition matches. Also, I found a very real lack of historical data. This might be due to my lack of familiarity with the sport, but I just don't think there is much interest in golf statistics. Don't worry, that didn't stop me from staying up until 3am several nights in a row, playing with golf statistics!

I decided that measuring success in golf's major championships would be the truest measurement of career success. The majors have been the only constant in golf's history (except of course, for the years the tournaments were cancelled in wartime). I declined to rank the great women golfers: it would be purely a judgement call and I couldn't find the function in Excel.

I decided on a 1/x function for the Major Points statistic. I toyed with more complicated formulas, but found them wholly dissatisfying. I agree that this might seem more heavily weighted towards rewarding the Golden Bear, with his record of top 3 finishes - but its more than fair. Consider the purses: the 2012 US Open rewards second place with 60% of the winners take, and 37% of the winners take for third place. My simple 1/x formula gives 50% and 33%, respectively. So I don't think this is too-heavily favoring Jack's career over Tiger's . (I could have, for example, awarded points based on the percentage of the total prize money available at major tournaments every year, although this system would have been too daunting a task for me to justify).

First, I painstakingly extracted the data from wikipedia, where they have tabulated  major tournament results timelines for all of golf's greats. I decided the "cutoff" to be considered in this ranking was to have won four majors. For several reasons: firstly, I started this task trying to compare Tiger Woods to Jack Nicklaus; second, I wanted to deal with a manageable amount of data; and thirdly I only wanted to compare great with great and four majors is a pretty exclusive bunch. (Note: Billy Casper was ranked near the top 10 of male golfers by Golf.com which was my original group I was using, while only having won 3 majors.)

Then, I ranked the golfers from the best (Jack Nicklaus) to the "worst" - but make no mistake, all of these golfers were among the best in their eras. I highly encourage you all to read the wikipedia articles on these players. Some (most!) of their lives were fascinating. It seems the golfer's today keep their lives so private we don't get the fascinating stories that the older golfers left for us. (I didn't write this as a jab at Tiger, but I'm totally leaving it here as a jab at Tiger). The early eras were dominated by locals, since it was a new sport. But golf.com still ranked them in their top 20 (which included several women). There were fewer players playing and there were fewer majors - so less points to go around. I think this gives a natural "curving" to the rankings, allowing us to respect the history of the game (no one thinks Old Tom Morris would be competitive in today's game) while still giving today's amazing athletes their proper dues.

A summary of "Major Points" - earned for finishes at major championships, 1/finish, so...
1st: 1 point = 1.00
2nd: 1/2 points = .5
3rd: 1/3 points = .333
20th: 1/20 points = .05






The first interesting result besides the rankings of players (nerdy stuff below, see bottom of page) I found was by finding how many Major Points were won by year of competition.

The chart below is a composite performance by every golfer in our rankings - the 28 people who have won 4 majors or were ranked by golf.com to be a top 20 golfer. The horizontal axis is their year of playing in a major. The columns represent the sum of the "major points" that all of the players earned during that year of their career. Don't worry about matching each individual with their performance in this chart - a full collection of individual performances is below.





Then, I overlayed this "average career" over each of our 28 golf great's actual careers and saw how they all compared.





The scale on these plots were chosen by Excel, matching the peaks of the bar chart with the peak of the average career plot. Its amazing how closeley the individual careers can follow the average. I did not include Tiger Woods in the average career since he is so near his prime career still. Phil Mickelson and Ernie Els are both 4-5 years further along than Tiger, and were included.

So, does Tiger have a chance to catch Jack? No, simply put. He is already second, and there have been impressive careers before him. He will put significant distance between him and the pack that is close behind him. I do not doubt his ability to match Jack's significant mark of 18 majors, but while Tiger has 78% of the major titles Jack has, he has only 58% of the Major Points that Jack accrued.

Tiger is my favorite golfer - growing up casually following professional golf, Tiger was golf. I liked him - who didn't? I never idolized him though, so his fall was easy for me to get over. I'll be cheering for him, but I know I won't see the day where he is ever number one.

Here are some timelines of golf, broken down into readable sections by era. It's interesting to see how much competition Jack had, but who knows who will go on to win more majors in Tiger's era.






(1)This curve, as you can see, gives a very nice competing exponentials model (see: this, this, and this for examples) - the increase in talent with age and experience, and the subsequent decrease from getting older. I just used a fourth-order polynomial to approximate this, since I only need a good curve, not a scientifically rigorous result.

Monday, July 23, 2012

Why Punish Penn State?

The horrors of the Penn State scandal cannot be understated, but the punishment awarded to them a couple of hours ago by the NCAA needs to be put in perspective as well. The NCAA is an athletic association of universities; it's business is in the regulation of athletic competition between student-athletes.

Of course, this is not about punishing Penn State for the vile acts committed by one person affiliated with the program, this is all about the cover-up. It always is. The power given to the football program at PSU was abused in a vain attempt to put the interests of the football program over the best interest of humanity. I sincerely hope that the justice system finds a way to properly award those who are actually at fault for the acts that went on under their watch.

I'd like to appluad the NCAA on acting promptly - nothing the NCAA does is done with this promptness it has seemed. This will serve as a basis to judge all future NCAA actions in timeliness. Now, lets examine what the NCAA actually accomplished today.

From The Big Lead:

-Penn State has been fined $60 million
-4 year bowl ban
-Vacated wins from 1998-2011
-20 total/10 annual scholarship reduction for 4 yrs
-Any entering or returning players can transfer without penalty

The fine, even if it is not paid by the insurance company (as I read on twitter before the announcement, but have not heard since, so may be untrue), is not very steep. As Forbes is reporting, the football program (at its old pace) was printing cash, and any monetary penalty would have to be much larger to significantly impact the viability of the football program.

The only people that care about the vacated wins are the Penn State fans (maybe) and Paterno's die-hard supporters. We all know who won those games. If this had been a competitive violation, the argument could be made that the wins should be vacated. No one (that I have seen) has suggested that Joe Pa had any significant NCAA violations in that regard. Considering the depth of the Freeh report, maybe this is a real testament to his football integrity.

The competitive penalties are the real, significant, and devastating. No bowl games, a huge loss in scholarships, and an express lane for current players to leave. This mauling of a program, which is more important than any one person, has left me with a bad taste in my mouth. What is the point of it? Why do this?

As I see it, there are three reasons to levy a penalty (from a philosophical perspective):

-Punishment - to remove the advantage gained from having committed the wrongdoing.
-Safety - to ensure that the wrongdoers are not able to continue their acts
-Discouragement - to convince others not to act wrongly in future.

In what ways do the competitive penalties given by the NCAA accomplish this? The only advantage gained by Penn State was that their program's public image remained untainted. I think its safe to say this advantage has been eliminated, organically. The financial penalties are probably fair if not overly lenient - the money going to help victims of child sex abuse.

The individuals that were part of the cover up need to be brought to justice by the court system. The NCAA of course has no part in this, the "safety" aspect. Obviously, the most important individual is already behind bars. Others are going to court for their parts in the cover-up (and lying about it).

Did the NCAA need to destroy the football program to discourage others from doing this? I would hope not. Since we've eliminated the other two aspects, this must be the NCAA's intent. But who is punished by this decision? Mostly, the fans and players. Hundreds of thousands of Penn State fans no longer have a competitive team to cheer for. There is no need to pity them, its just sports - but this isn't how sports should work. The acts of a single individual, and the following enabling acts of a handful of individuals, has lead to an entire program's practical demise.

So, what would be an appropriate discouragement? There was no football advantage - so give the program no football penalties. Education, accountability, oversight, and money. Ensuring the University has to pay for its transgressions. That's how this should have been done.

Thursday, July 19, 2012

NBA Playoff Winning Percentages by Game



In the 1983-1984 season, the NBA switched to a playoff system much like the one currently in place. A total of 16 teams make it to the tournament, and every team plays every round. So, every season has 15 series total. From 1984 to 2003, the first round was a best-of-five series. Since then, all playoff series are best-of-seven contests.

While the number of teams and number of series have remained constant, the seeding has gone through several permutations, giving different seeding advantages to division winners. However, the home-court advantage always goes to the team with the best record.

For example, this year the Boston Celtics finished first in the Atlantic Division with a 39-27 record and the Atlanta Hawks finished second in the Southeast Division with a slightly better record, 40-26. As the division champion, the Celtics were guaranteed a top-four seed, despite having finished with the fifth best record in the East. This meant that as the four seed, home-court advantage was given to the five seed. (Note: this had no effect - however, had the Celtics won the division with the eighth best record, they would have still faced the Hawks instead of the #1 seed - the Bulls). In my analysis, I used the NBA's home-court advantage to determine the "higher seed."

Every game has its own flavor. From Game 1, with the anticipation of match-ups and rivalry to tense game 7s with seasons on the line. I went through the last 27 years of playoff series and found the winning percentages of the home team and the higher seeded team.

Now, on to the games!

Game 1


Game 1 is always a home game for the higher seed. The home team has won 76.05% of these games. This nearly matches the overall home winning percentage of the higher seed (75.95%).

Game 2

Like Game 1, Game 2 is always a home game for the higher seed. Two possible game 2's exist - the 0-1 game (with the lower seed having won the first game) and the 1-0 game (with the favorite winning the first game).

If the underdog had won the first game, the second game is won by the favorite 79.38% of the time. If the favorite won the first game, the favorite goes up 2-0 73.7% of the time.

This means that favorite more often wins game 2 if they first lost game 1. The combination of the complacency of the underdog, already having snatched home-court advantage back and the desperation of the favorite at the prospect of getting into a two game hole before going into enemy territory leads to a significant increase in home winning percentage.

Overall after game 2 56% of series have seen the favorite up 2-0, 39% tied 1-1, and 5% have the underdog cleaning up in the favorite's house, up two games to none over the favorite.

Game 3


For game 3, the home team is always the underdog. In 20 of 405 attempts, the underdog is already up two games to none (5%). In 12 of these 20 games, the underdog takes a 3 game lead on the favorites (winning 60%). This isn't far from the overall winning percentage of the home team - they win 56.3% of game 3's overall. However, with such a small sample size, this isn't very useful information.

When the underdog is down 2-0, they win 58.15% of the time in game 3. If the series is tied, the home team wins 53.16% of the games. One might think that the home team would win more often after having won once on the road, but the opposite is true. The condition of the series (the higher seed not wanting to fall behind in the series) is more indicative of the result of game three than the idea that the teams might be more closely matched.

This could be a general trend, but is more likely an overlap of two different scenarios. The first scenario being that the higher seed is significantly superior to the lower seed, and facing a deficit in the series, really turns it on and dominates game 3. The second scenario being that they are actually closely matched and the home team wins most of the games.

In 5-game series, the favorites swept in 50% of their chances - 43 of 86 attempts. This number is the lowest winning percentage for the home team with at least 50 games played. This is probably a testament to the extremely high numbers of teams that were allowed in the playoffs when the league first switched to a 16-team playoff. In 1984, there were only 23 teams in the league, so 70% of the league made the tournament.

After game 3, of which 405 have been played: 

  • 95 times (23.46%) the favorite winning 3-0 (43 times ending the series) 
  • 206 times (50.86%) the favorite is up 2-1 
  • 92 times (22.72%) the underdog is winning 1-2 
  • 12 times (2.96%) the underdog is up 0-3 (a 3-game sweep 5 times) 

Game 4


In game 4, the home team is again always the underdog, just like in Game 3. Remarkably, the higher seed has won this game significantly more often than game 3. Boasting a nearly-even 49.58% winning percentage over the past 27 seasons, game 4 has the higher seed overcoming the home-court advantage of the lower seed.


For the 7 games played with the underdog threatening a sweep, only once has the favorite bounced back and took a game (Western Conference Finals, 2005 - the Suns stole game 4 but lost in 5 to the eventual NBA champs, the Spurs). The other 6 times, the underdog got the brooms out.


For the 83.47% of games that start with the series at a 2-1 tally (either the underdogs or the favorites with a one-game lead) the results are very similar, right around a 50% winning percentage. These are cases where neither team has its back against a wall. The previous performance in the series is indicative of the result of this game (although only to a small degree). If the lower seed is up two games to one, they go on to take a 1-3 lead 53.26% of the time. If the higher seed has the 2-1 lead, the lower seed only wins 50.97% of the time. A small, but interesting, difference.


The favorite has threatened to sweep (being up 3 games to none) in 52 of the 253 best-of-seven game series that have been played in the last 27 years. These series obviously represent the games where the favorite is significantly superior to the home team having won both of their home games and their only road game, and represents by far the highest winning percentage of any visiting team, winning 32 of the 52 tries (61.54%). In fact, the next highest away-team winning percentage in a seven game series is game 3 in a tied series, when the favorite wins to take a series lead 49.49% of the time.


Unfortunately, there is no way to compare the winning percentage of the best-of-five series sweeps to best-of-seven sweeps since the close-out game 3 is the first game played at the underdog’s home-court.

Game 5


While game 5 is usually a home game for the favorite, in the finals game 5 is the third home game in a row for the underdogs.


When the underdog has a 3 games to 1 lead going in to game 5, the higher seed wins 72.73% of the games to bring the series to a 2-3 tally. This is a high winning percentage, but still lower than the overall game 5 favorite winning percentage of 74.53%. This slightly lower winning percentage could be due to some game 5’s being away games for the favorite, or that the lower seed has to be a worthy opponent to have taken a three games to one series lead.


When the favorite is ready to clinch in game 5 with a 3-1 lead, they are almost always playing at home and have a remarkable success rate of 76.74%. This winning percentage is likely dominated by the higher seeds winning against an outmatched opponent that got a win at home in game 3 or 4.


With the series tied at two games apiece heading into game 5, the home team wins 74.32% of the games, to take a series lead. The majority of game 5’s that have been played over the last 27 seasons (55.43%) are of this type, with the series lead in the balance.

Game 6

Game 6 is usually played at the underdogs home-court (the exception being finals games, of which 6 underdogs have won on the road since 1984). Only two records are possible going in to game 6: 3-2 in favor of the top seed, or 2-3 in favor of the lower seed. Over the past 27 seasons, 66.43% of game 6’s have been 3-2 in favor of the top seed.


When the top seed has a chance to win the series game 6, they are on the road, with two chances to clinch the series, while the underdog has their season on the line at home. In 46.24% of the games, the underdog pulls it out and takes the series to a game 7. Given the gravity of the situation, and that the underdog has already won two games, one might think that this would be more in favor of the lower seed, but in fact it is below the average winning percentage of the lower seed in their home-court (54%).


When the lower seed has won three games going into a game 6, they are relatively dominant - winning 72.34% of their chances to win the series on their home-court. This is easily the highest winning percentage of any other game by the underdog (except for the 6 times the lower seed has swept in the 7 times they had a chance to in game 4 at home).


This large gap in winning percentages in game 6 - in series that has already gone to 6 games - is surprising to me. Only one game of six separates the two teams and there is a 26% difference in winning percentage.

Game 7


In the 27 seasons that I analyzed since the playoffs switched to a 16-team format, 56 playoff series have gone to a game 7. The top seed has been dominant, winning 82.14% (46 of 56). Considering that the lower seed has already won 3 games against this team, its a very significant edge by the higher seed.


Playoff basketball is all about attitude and talent. The higher seeds usually have the talent, and when their backs are against the wall, the talent perseveres.


The difference in winning percentages depending on the record of the teams in the playoffs is astounding. Whether the difference is a testament to player’s will to win when the pressure is on, or if its an embarrassment that they don’t try hard enough in early games, I’ll leave up to you to decide.








Google "Jason Kidd" Lately?

I'd love to hear the story about how this photo became the default google "Jason Kidd" photo. Classy.


(Note: as of 19 July, a more appropriate picture has replaced the one shown.)



The Numbers Behind the Jeremy Lin Contract

There's a story out there that no one is talking about in this NBA offseason: a little-known player named Jeremy Lin (a Taiwanese-American who graduated from Harvard) has been offered a contract by the Houston Rockets. Since Lin is a restricted free agent, his team last year (the New York Knicks) can match the offer and keep Lin.
In either an effort to make it impossible for the Knicks to sign Lin, or to force them in to going way over the luxury tax threshold, the Rockets offered Lin a somewhat-ridiculous 3 year deal for approximately 5/5/15 - a total of approximately 25 million but with a huge number in the third year. Due to the teired luxury tax system, it is much more expensive for the Knicks to sign him than a team with more flexibility like the Rockets.
Since, in reality, the news coverage of this story in this quiet lull between interesting sports (sorry, baseball) has been saturating, I'm sure we have all heard this before - and heard that it could cost the Knicks anywhere from $30 million to $75 million the third year. Obviously, its not Lin that costs that much, its the sum of the contracts.
So I decided to take a look at what every NBA team had in guaranteed salaries for that season.
Guaranteed NBA Salaries 2014-2015
The season in question, 2014-2015, the Knicks would have $75 million guaranteed for Carmelo Anthony, Amare Stoudemire, Tyson Chandler, and Lin. This is tops in the league, but only by 5-6 million over the Nyets and the Heat. Considering the salary-cap options available to them - the stretch clause, or trading someone (not just Lin), I find it hard to believe they can't afford Lin.
After all, Dolan is not hurting for cash and Lin could easily turn in to a great revenue stream and pay for himself anyway.

Tuesday, July 17, 2012

NBA Draft Analysis

I've wanted to look at the NBA draft for a while now - I had lots of questions. I tried to answer a few of them by looking at the last ten NBA drafts (2002-2011) and looking at how their careers turned out relative to their draft positions.

The first thing I had to figure out was how to compare careers. Simple box-score metrics obviously don't work - looking at points per game would be a very poor single indicator of career success in the NBA. I did some cursory investigating into advanced basketball statistics (APBRmetrics) and found a lot of ideas are out there.

I get the majority of my data from the wonderful Basketball-Reference site, and found that they list Player Efficiency Rating (PER) and Win Shares. Other sites, like the NBA Geek, list Wins Produced. Another metric is adjusted plus/minus, and while I'm sure it has its merits, I'm not really interested in assessing NBA drafts in a world that has Eric Bledsoe as the best per-minute player in the NBA last year. Its not just refusing to accept something that goes against my pre-concieved notions of what happened last year, its that public perception and box-score statistics are how players and drafts are evaluated.

I liked Win Shares and Wins Produced - but they both attempt to gauge a players defensive contributions by looking at how the team did. I don't want to give a player credit for playing with good defenders, it doesn't seem fair to me.

The most important metric is minutes played - if you're good enough to get on the floor and stay on the floor, you're a contributing member of the team. No other stat can replace that. I decided the best statistic to use for measuring a players quality of contribution is the Player Efficiency Rating. Despite its flaws, it gives a great picture of a player's ability to contribute. Most importantly, for this exercise, it is relatively consistent with perception. If a GM drafts someone with high PER for his draft position, chances are that pick will be viewed as a "success" when the GM is evaluated.

What I calculated is the PER*Minutes Played - PERMP - for each player drafted since 2002. You can view each photo in full resolution by clicking the gear in the top right corner, or just view the album in full here.



These results were very interesting to me and once I averaged each draft pick's performance per year, it gave relatively nice curve.




The formula for the percent of the first pick each pick is worth: 318307*e^(-0.06167*Pick)/299270. This gives:

1 100%
2 94.0%
3 88.4%
4 83.1%
5 78.1%
10 57.4%
20 31.0%
30 16.7%
45 6.63%
60 2.63%

Of course, the numbers at the top are silly - in the 2012 draft, the top pick was probably worth double any other pick - it is expected that Anthony Davis be a superstar, and everyone else would be a longshot for superstar status (see: 2004 Dwight Howard or maybe 2003 LeBron (one pick being significantly better, despite there being a lot of superstars in 2004)).

But, in the 2007 draft, the top pick was only marginally better than the second pick - you were still getting Oden or Durant (which, at the time, was a toss-up). But once you leave the top 5, the pick values are a lot more useful and consistent from draft to draft.

Again, much thanks to basketball-reference.com for all their data. If you have time, check out an analysis posted at 82games.com - he analyzed the drafts from 1980-2003 with a significantly different process and got very similar results.

Note: Originally posted on my tumblr blog.