/cdn.vox-cdn.com/uploads/chorus_image/image/72826526/usa_today_21770053.0.jpg)
Despite being on an offense with few bright spots, New York Jets running back Breece Hall has shined as bright as just about any running back in the league. Highlighting that, Breece Hall is the league leaders in yards per carry with 5.7.
Given this, it may have come as a bit of a surprise to some when the Gang Green Nation comment section had some discussion about the performance quality of Breece Hall. Basically, no one argued that Breece wasn’t playing well but some argued that he could be more consistent given how regularly his runs go for few yards. I thought this was an interesting point, so I figured I’d investigate it further using some statistics and plots to test the merit of the argument.
What do the statistics say?
To begin, I used NFLFastR which is an ‘R’ (statistical software) package that houses NFL play-by-play today. If you are interested in the full code that was used I’ve pasted that at the end of this article (and I’m happy to answer any questions about in the comments section).
For those who are less concerned with the code and more concerned with what was done, I narrowed the data down to only carries by Breece Hall. In support that my cleaning worked, I ended up with a dataset of 78 Breece Hall carries that had a mean (or average) yards per carry of 5.7, both of which match the data recorded by ESPN.
However, it’s important to remember how an average is calculated, which is by taking the sum of datapoints and dividing it by the number of datapoints; in this example that would be total rush yards divided by carries. However, a drawback to using the mean is that it allows for outliers (or one-off datapoints that are meaningfully different from the typical datapoints) to skew the data. For example, if a running back has 9 carries for 0 yards and 1 carry for 90 yards then his yards per attempt would be 10 even though not a single carry went for 10 yards. Accordingly, if we had to take a guess on what the “typical” or “expected” yards per carry on the next play was, we’d likely not want to use the number 10 as our guess.
To address outliers, one can instead use the median. If we think of every datapoint as existing on a timeline from the smallest to the largest number, the median is the point that is smack dab in the middle of the timeline. So if a running back had 3 carries of 0 yards, 1 for 5 yards, and 3 for 10 yards, then the median would be 5. Importantly the median is thought to be less vulnerable to outliers because the outliers are treated the same as any single datapoint that is above or below the median.
Related to Breece Hall, his median rush is only 3 yards per attempt, which is notably lower than his mean.
What does the data look like?
There is more to data then just one descriptive statistic though and we should also consider the spread of the data. For example, if running back A has 9 rushes for 0 yards and one rush for 90 yards while running back B has 10 rushes for exactly 9 yards each, then each has averaged 9 yards per carry, but they got there in very different ways. Given this, I decided to plot Breece Hall’s number of carries at each amount of yards as reported by NFLFastR (which does not use decimal points).
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/25049321/Breece_Hall_Rushes_All.jpeg)
As you can see here, Breece’s data is largely clustered between -4 and 10 yards with a few outliers. This aligns with the finding that his median is lower than his mean, as the few longer rushes are likely dragging his average up, especially when he only has 78 carries this season. Given that clustering at the front end makes the graph tough to interpret, I then recreated the graph with only the counts for his carries under 20 yards.
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/25049326/Breece_Hall_Rushes_20.jpeg)
As shown a bit more neatly here, we can see that Breece’s most common carries seem to fall between 0 and 3 yards, supporting the “his typical carry goes for a few yards” argument.
But what is typical from a good RB?
While the data does support that Breece Hall regularly has carries for a few yards, I assumed that to be the case for most running backs based on what I’ve seen from NFL football over the last 20 years of my life. Given that the argument around Breece seems to be more about whether he’s playing at an elite or just a really good level, I figured I would compare his effectiveness to the player who is generally considered the NFL’s best running back: San Francisco 49er Christian McCaffrey.
To do so, I created a dataset for McCaffrey’s data that was identical in structure to the data I’d pull for Breece Hall. then added some new columns to each:
- 3 yards or more: Scored 0 if a rush went for less than 3 yards and 1 if a rush went for at least 3 yards
- 4 yards or more: Scored 0 if a rush went for less than 4 yards and 1 if a rush went for at least 4 yards
- 5 yards or more: Scored 0 if a rush went for less than 5 yards and 1 if a rush went for at least 5 yards
From there, I then just went ahead and compared the percentage of carries that met this criteria, which was easily calculated by simply taking the average of each column.
For 3 yards or more: Breece Hall was at 54% and McCaffrey was at 62%
For 4 yards or more: Breece Hall was at 40% and McCaffrey was at 44%
For 5 yards or more: Breece Hall was at 32% and McCaffrey was at 34%
As shown here, these numbers are pretty comparable. Beyond that, McCaffrey’s median rush was also 3 yards per carry and his mean was 4.8.
Long story short, McCaffrey might be doing slightly better than Breece on a “typical” carry basis, but the difference between Breece and the NFL’s best is rather small, further supporting that Breece really has been among the NFL’s best backs.
Summary
Putting all of this together, any criticism of Breece seems rather unwarranted. Sure, he could be a bit more effective on his typical run via reducing the number of carries that don’t get at least 3 yards but that seems like nitpicking in the grand scheme of how good he’s been. Overall, the data suggests fans should be very happy with Breece’s performance to date.
Code (#s are parts of the script that are not executed and are used to explain what the line of code does)
#be sure to install these libraries prior to trying to use them for the first time
library(nflfastR) #has the NFL data
library(tidyverse) #used to clean the data
library(psych) #used to get means and medians
data <- nflfastR::load_pbp(2023) #gets only the 2023 season data
breece <- data %>%
filter(rusher_player_name == “Bre.Hall”) %>% #filter dataset to only have Breece Hall’s data
select(home_team, away_team, rusher_player_name, rush_attempt, rushing_yards) #keeping only these variables
psych::describe(breece$rushing_yards) #getting descriptive statitics (mean/median)
nums <- c(seq(-4, 10, 2), -10, seq(10, 100, 5)) #creating list of numbers for x axis of plots
ggplot(breece, aes(x=rushing_yards)) +
geom_histogram(binwidth=1, color = “white”, fill = “darkgreen”) + #making histogram
scale_x_continuous(name=”Rushing yards per carry”, limits=c(-10, 100), breaks = nums) + #customizing x axis
scale_y_continuous(name = “Count”, limits = c(0,15), breaks = seq(0, 15, 2))+ #customizing y axis
coord_cartesian(xlim = c(-10, 100), ylim = c(0, 15), expand = FALSE) + #setting beginning and end poitns for the plot
theme_bw() + #changing color scheme
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) #getting rid of grid lines
#recreating the above plot but only for rushes of 20 yards or less
ggplot(breece, aes(x=rushing_yards)) +
geom_histogram(binwidth=1, color = “white”, fill = “darkgreen”) +
scale_x_continuous(name=”Rushing yards per carry”, limits=c(-10, 20), breaks = nums) +
scale_y_continuous(name = “Count”, limits = c(0,15), breaks = seq(0, 15, 2)) +
coord_cartesian(xlim = c(-10, 20), ylim = c(0, 15), expand = FALSE) + theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
breece <- breece %>%
mutate(OverThree = ifelse(rushing_yards >= 3, 1, 0)) %>%
mutate(OverFour = ifelse(rushing_yards >= 4, 1, 0)) %>%
mutate(OverFive = ifelse(rushing_yards >= 5, 1, 0)) #columns for % of rushes over x yards
mean(breece$OverThree) #getting % of runs over 3 yards
mean(breece$OverFour) #getting % of runs over 4 yards
mean(breece$OverFive) #getting % of runs over 5 yards
#recreating data with McCaffrey for reference
mccaffrey <- data %>%
filter(rusher_player_name == “C.McCaffrey”) %>% #filter dataset to only have Christian McCaffrey’ss data
select(home_team, away_team, rusher_player_name, rush_attempt, rushing_yards) #keeping only these variables
psych::describe(mccaffrey$rushing_yards) #getting descriptive statitics (mean/median)
mccaffrey <- mccaffrey %>% mutate(OverThree = ifelse(rushing_yards >= 3, 1, 0)) %>%
mutate(OverFour = ifelse(rushing_yards >= 4, 1, 0)) %>%
mutate(OverFive = ifelse(rushing_yards >= 5, 1, 0)) #stats for % of rushes over x yards
mean(mccaffrey$OverThree) #getting % of runs over 3 yards
mean(mccaffrey$OverFour) #getting % of runs over 4 yards
mean(mccaffrey$OverFive) #getting % of runs over 5 yards
Loading comments...