Under the Hood of an Analytics Project

We won’t use all the datasets, and our primary focus is on the play-by-play data.Our first step is to load the play-by-play data, extracts goals and shots, create columns that identify the shooters and goalies, and scale the x and y locations so that we can show our shooting analysis on a hockey plot that I’ve made on Plotly.Then, we write a function that takes the output of the shooting_df function and adds some data that we’ll need..We’ll add in team names, build out the dataset so we can analyse home/away instead of win/loss.Now we’ll build our dataset we’ll use in the final analysis..I wanted to determine what a rebound was so I could use that information to identify good shot locations..Since there was no clear definition I could apply I needed to analyze this myself.I run a loop over the unique games in the dataset and sort each game chronologically..I can then iterate shot by shot over the dataset and build out the time between shots, the outcome of those shots, and the location of the rebound..We then add a few necessary columns that allow us to cut the data in a few different ways for analysis, specifically adding a ‘for’ column so we can look at teams across their entire season.To get a better understanding of what a rebound is, we need to aggregate our rebounds dataframe by timesteps ..At the same time, I added a few text columns for the Plotly chart that we’ll use to get a better picture of the impact of rebounds and the rate of scoring.That allows us to create the first two plots we looked at above that gave us an idea what a rebound is.Knowing that there was a sharp uptick in goals for shots under 3 seconds, I categorized rebounds by any shot where the previous shot was under that timeframe..It gave us a good picture of the overall impact of rebounds.And the Plotly chart below.While shots leading to rebounds make up only 5% of all shots, they make up more than 15% of the all goals scored.I also wanted to see whether there was a pattern to quality shots..The code for producing the heat map on a rink is long — so I won’t embed it all here, but you can find it at this link.At a high level we run through the following:We create various Plotly shapes that produce an empty hockey rink as a plot..The X and Y axis are used to simulate one half of the arena.We create a set of boxes that split the half-rink into 625 equally sized boxes (25 x 25).We take a subset of the data whom’s shots to fit within the x-y coordinates created for the boxes in the previous step..We then color this as desired (shots, goals, or goal-ratios)We produce a data point at the center of each box to provide mouse-over information.The result is below (interactive version here)I reviewed a few variations of this plot to better understand shot location..Nothing non-obvious jumped out, but it gave some ideas of creating a feature that involves the use of these boxes to produce a shot-quality metric in some further analysis.. More details

Leave a Reply