Many people have attempted to answer the question of whether or not home ice advantage exists. A 2013 paper by Tim Swartz and Adriano Arce came to the conclusion that home ice advantage in the NHL is real, accounting for about 5% of goals in 2012. Furthermore, they observed that when total goals per game are accounted for, there is no appreciable change in the home ice advantage over time. Lastly, they performed a one way ANOVA and a pairwise Tukey’s HSD test on 16 NHL teams that have played in 30 seasons to determine if there is a significant difference between teams’ home ice advantages. The result from their analysis was that there is not sufficient evidence to conclude that home ice advantage varies between teams.
Later in 2017, a fivethirtyeight blog post also tried to tackle this problem, coming to the conclusion that home ice in the regular season amounts to a win percentage of 55.1%, 5.1% better than even odds. In the playoffs this advantage declines slightly to a boost in win percentage that is only 4.8% better on home ice than at a theoretical neutral rink.
The above two results appear fairly consistent with each other, and one might be satisfied that home ice advantage is a real phenomenon. This is probably a safe bet, but I’m all for reproducible research so I’m going to try my hand at answering this question myself and corroborating the previous findings. My goal is also to learn a bit about the nature of home ice advantage so I can use it later in a model for predicting NHL game outcomes.
There are three questions that I want to answer:
Is there evidence for home ice being advantageous in general?
Is there evidence that this advantage varies from team to team?
Is there evidence that home ice advantage still exists if a game goes to overtime or shootout?
To do all of this I am going to a use few different strategies. First I will attempt to demonstrate that the distribution of goal totals for home and away teams follow a Poisson distribution. Then I will show that the distributions are significantly different using a Wilcoxon Signed-Rank Test. Next, I will try to determine if the extent of home ice advantage varies significantly from team to team using a two way ANOVA on a Poisson Generalized Linear Model. Finally, I will look at whether there is evidence for home ice advantage still playing a role in games that go to overtime or shootout using binomial tests and a simple proportion test.
Overall the analysis I am about to do will be fairly quick and dirty, because I’m really only interested in answering the above questions to guide my model building process for my next post. As such, I do not recommend that anyone take the results of this analysis as being authoritative.
Acquiring Official NHL Data
To start, we will need to get the data for NHL games that have already been played. I will do this using the undocumented NHL statistics API, which returns json files with a mostly self evident structure. These files will be retrieved and parsed using Python and the very easy requests and json libraries.
Are the Distributions of Regulation Time Goals Poisson?
Let’s start by looking at the actual distribution of home and away goals.
Both appear to be roughly Poisson distributed at first glance. Next let’s look at how the observed home and away goal distributions compare to a theoretical Poisson distribution with the same mean.
Home:
Away:
We can see that the distribution of scores for both home and away teams is not actually Poisson, but it pretty close. I appears to skew slightly lower than the expected values, which means the data is under-dispersed. This under-dispersion doesn’t appear to be very severe however, so I am comfortable going forward assuming that the goal counts can be modeled as Poisson.
Is There a Difference Between Home and Away Goal Counts?
Because our distributions are not even remotely normal, but also not exactly Poisson, I will use the Wilcoxon Signed Rank Test to test if the means of each are different:
The Wilcoxon Signed Rank Test has yielded a highly significant p-value. This indicates that the average score differential of 0.252 additional goals for the home team is a real effect, not just statistical noise.
Does Home Ice Advantage Vary by Team?
In order to assess if the interaction between team and home vs away games is significant, I will run an ANOVA on a Poisson family generalized linear model fit to the goal data from each game. My intent with this is to use the interaction term to decide if there is evidence that home team advantage varies by team.
In the above Two Way ANOVA, we can see that the effect of team on goals scored is highly significant, as is the effect of game_type. This means that the mean number of goals scored per game varies between home and away games when we ignore which teams were playing. We already determined this earlier with the Wilcoxon Signed Rank Test, so it is reassuring to see the same result here. Similarly, the mean number of goals scored per game varies between teams when ignoring if a game was played at home or away. The interaction term has a p-value of 0.648, which is no where near to being significant. This indicates to us that there is insufficient evidence to conclude that the effective difference between home and away games varies across teams. This appears to answer our second question with a definitive “no”.
Does Home Ice Advantage Exist in Overtime or Shoot Out?
Finally, we will attempt to answer one last question: Does the home team still have an advantage if a game goes to overtime? To answer this question I will split the games into two categories for overtime wins, and non-overtime wins. I will then examine if there is evidence that either of these differ from even odds. Finally I will use R’s prop.test() function to see if there is evidence that the two proportions differ.
We can see that there is evidence for home ice advantage leading to wins during regulation time, but once a game goes to overtime the evidence in favour of home ice advantage appears to vanish.
Running a proportion test confirms that the two proportions appear to be different. For the purposes of my hierarchical Bayesian model, I will assume that once a game goes to overtime the probability of one team winning over the other will just be a Bernoulli random variable with probability based on the historical win percentage of that team over the other. No home ice advantage will be considered at that point.