Title: | Functions to Efficiently Simulate and Evaluate NFL Seasons |
---|---|
Description: | A set of functions to simulate National Football League seasons including the sophisticated tie-breaking procedures. |
Authors: | Lee Sharpe [aut], Sebastian Carl [cre, aut, cph] |
Maintainer: | Sebastian Carl <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0.9901 |
Built: | 2024-11-14 05:30:27 UTC |
Source: | https://github.com/nflverse/nflseedR |
Compute NFL Playoff Seedings using Game Results and Divisional Rankings
compute_conference_seeds( teams, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE, playoff_seeds = 7 )
compute_conference_seeds( teams, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE, playoff_seeds = 7 )
teams |
The division standings data frame as computed by
|
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
playoff_seeds |
Number of playoff teams per conference (increased in 2020 from 6 to 7). |
A data frame of division standings including playoff seeds and the
week in which the season ended for the respective team (exit
).
A list of two data frames:
Division standings including playoff seeds.
A data frame that is used for head-to-head tiebreakers across the tie-breaking functions.
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h) %>% purrr::pluck("standings") }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h) %>% purrr::pluck("standings") }) # Restore old options options(old)
Compute NFL Division Rankings using Game Results
compute_division_ranks( games, teams = NULL, tiebreaker_depth = 3, .debug = FALSE, h2h = NULL )
compute_division_ranks( games, teams = NULL, tiebreaker_depth = 3, .debug = FALSE, h2h = NULL )
games |
A data frame containing real or simulated game scores. The following variables are required:
|
teams |
This parameter is optional. If it is
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
A list of two data frames:
Division standings.
A data frame that is used for head-to-head tiebreakers across the tie-breaking functions.
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% purrr::pluck("standings") }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% purrr::pluck("standings") }) # Restore old options options(old)
Compute NFL Draft Order using Game Results and Divisional Rankings
compute_draft_order( teams, games, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE )
compute_draft_order( teams, games, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE )
teams |
The division standings data frame including playoff seeds as
computed by |
games |
A data frame containing real or simulated game scores. The following variables are required:
|
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
A data frame of standings including the final draft pick number and
the variable exit
which indicates the week number of the teams final
game (Super Bowl Winner is one week higher).
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems games <- nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2018:2019) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) games %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h, playoff_seeds = 6) %>% nflseedR::compute_draft_order(games = games, h2h = .$h2h) }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems games <- nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2018:2019) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) games %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h, playoff_seeds = 6) %>% nflseedR::compute_draft_order(games = games, h2h = .$h2h) }) # Restore old options options(old)
NFL team names and the conferences and divisions they belong to
divisions
divisions
A data frame with 36 rows and 4 variables containing NFL team level information, including franchises in multiple cities:
Team abbreviation
Conference abbreviation
Division name
Division abbreviation
This data frame is created using the teams_colors_logos
data frame of the
nflfastR
package. Please see data-raw/divisions.R
for the code to create
this data.
divisions
divisions
This function formats numeric vectors with values between 0 and 1 into percentage strings with special specifications. Those specifications are:
0 and 1 are converted to "0%" and "100%" respectively (takes machine precision into account)
all other values < 0.01 are converted to "<1%"
all other values between 0.01 and 0.995 are rounded to percentages without decimals
values between 0.995 and 0.999 are rounded to percentages with 1 decimal
values between 0.999 and 1 are converted to ">99.9%" unless closer to 1 than machine precision.
fmt_pct_special(x)
fmt_pct_special(x)
x |
A vector of numerical values |
A character vector
x <- c(0, 0.004, 0.009, 0.011, 0.9, 0.98, 0.994, .995, .9989, .999, .9991, .99999999) fmt <- fmt_pct_special(x) data.frame(x = x, fmt = fmt)
x <- c(0, 0.004, 0.009, 0.011, 0.9, 0.98, 0.994, .995, .9989, .999, .9991, .99999999) fmt <- fmt_pct_special(x) data.frame(x = x, fmt = fmt)
Lee Sharpe maintains an important data set that contains broadly used information on games in the National Football League. This function is a convenient helper to download the file into memory without having to remember the correct url.
load_schedules(...) load_sharpe_games(...)
load_schedules(...) load_sharpe_games(...)
... |
Arguments passed on to
|
A data frame containing the following variables for all NFL games since 1999:
The ID of the game as assigned by the nflverse. Note that this value matches the game_id
field in nflfastR if you wish to join the data.
The year of the NFL season. This represents the whole season, so regular season games that happen in January as well as playoff games will occur in the year after this number.
What type of game? One of the following values:
REG
a regular season game
WC
a wildcard playoff game
DIV
a divisional round playoff game
CON
a conference championship
SB
a Super Bowl
The week of the NFL season the game occurs in. Please note that the game_type
will differ for weeks >= 18 because of the season expansion in 2021. Please use game_type
to filter for regular season or postseason.
The date on which the game occurred.
The day of the week on which the game occurred.
The kickoff time of the game. This is represented in 24-hour time and the Eastern time zone, regardless of what time zone the game was being played in.
The away team.
The number of points the away team scored. Is NA
for games which haven't yet been played.
The home team. Note that this contains the designated home team for games which no team is playing at home such as Super Bowls or NFL International games.
The number of points the home team scored. Is NA
for games which haven't yet been played.
Either Home
if the home team is playing in their home stadium, or Neutral
if the game is being played at a neutral location. This still shows as Home
for games between the Giants and Jets even though they share the same home stadium.
Equals home_score - away_score
. The number of points the home team scored minus the number of points the away team scored. Is NA
for games which haven't yet been played. Convenient for evaluating against the spread bets.
The sum of each team's score in the game. Equals home_score + away_score
. Is NA
for games which haven't yet been played. Convenient for evaluating over/under total bets.
Whether the game went into overtime (= 1) or not (= 0).
The id of the game issued by the NFL Game Statistics & Information System.
The number of days since that away team's previous game (7 is used for the team's first game of the season).
The number of days since that home team's previous game (7 is used for the team's first game of the season).
Odd of the away_team winning the game.
Odd of the home_team winning the game.
The spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. This lines up with the result
column.
Odd of the away_team covering the spread_line
.
Odd of the home_team covering the spread_line
.
The total line for the game.
Odd of the total
being under the total_line
.
Odd of the total
being over the total_line
.
Whether the game was a divisional game (= 1) or not (= 0).
What was the status of the stadium's roof? Will be one of the following values:
closed
Stadium has a retractable roof which was closed
dome
An indoor stadium
open
Stadium has a retractable roof which was open
outdoors
An outdoor stadium
What type of ground the game was played on.
The temperature at the stadium (for roof
types outdoors
and open
only).
The speed of the wind in miles/hour (for roof
types outdoors
and open
only).
GSIS ID of the "starting quarterback" of the away team identified as the first
quarterback (per roster data) listed as passer
(in nflfastR
play by play data)
in 2+ plays that game. In the final regular season game it is the QB with the
most plays as the passer
.
GSIS ID of the "starting quarterback" of the home team identified as the first
quarterback (per roster data) listed as passer
(in nflfastR
play by play data)
in 2+ plays that game. In the final regular season game it is the QB with the
most plays as the passer
.
Full name of the "starting quarterback" of the away team identified as the first
quarterback (per roster data) listed as passer
(in nflfastR
play by play data)
in 2+ plays that game. In the final regular season game it is the QB with the
most plays as the passer
.
Full name of the "starting quarterback" of the home team identified as the first
quarterback (per roster data) listed as passer
(in nflfastR
play by play data)
in 2+ plays that game. In the final regular season game it is the QB with the
most plays as the passer
.
Name of the head coach of the away team.
Name of the head coach of the home team.
Name of the game's referee (head official).
Pro Football Reference ID of the stadium.
Name of the stadium.
The internally called function nflreadr::load_schedules()
try({#to avoid CRAN test problems games <- load_sharpe_games() dplyr::glimpse(games) })
try({#to avoid CRAN test problems games <- load_sharpe_games() dplyr::glimpse(games) })
Compute NFL Standings
nfl_standings( games, ..., ranks = c("CONF", "DIV", "DRAFT", "NONE"), tiebreaker_depth = c("SOS", "PRE-SOV", "RANDOM"), playoff_seeds = NULL, verbosity = c("MIN", "MAX", "NONE") )
nfl_standings( games, ..., ranks = c("CONF", "DIV", "DRAFT", "NONE"), tiebreaker_depth = c("SOS", "PRE-SOV", "RANDOM"), playoff_seeds = NULL, verbosity = c("MIN", "MAX", "NONE") )
games |
A data frame containing real or simulated game scores. The following variables are required:
|
... |
currently not used |
ranks |
One of
|
tiebreaker_depth |
One of
|
playoff_seeds |
If |
verbosity |
One of
|
A data.table of NFL standings including the ranks selected in the
argument ranks
For more information on the implemented tiebreakers, see https://nflseedr.com/articles/tiebreaker.html
try({#to avoid CRAN test problems games <- nflreadr::load_schedules(2021:2022) standings <- nflseedR::nfl_standings(games) print(standings, digits = 3) })
try({#to avoid CRAN test problems games <- nflreadr::load_schedules(2021:2022) standings <- nflseedR::nfl_standings(games) print(standings, digits = 3) })
This function simulates a given NFL season multiple times using custom functions to estimate and simulate game results and computes the outcome of the given season including playoffs and draft order. It is possible to run the function in parallel processes by calling the appropriate plan. Progress updates can be activated by calling handlers before the start of the simulations. Please see the below given section "Details" for further information.
simulate_nfl( nfl_season = NULL, process_games = NULL, ..., playoff_seeds = ifelse(nfl_season >= 2020, 7, 6), if_ended_today = FALSE, fresh_season = FALSE, fresh_playoffs = FALSE, tiebreaker_depth = 3, test_week = NULL, simulations = 1000, sims_per_round = max(ceiling(simulations/future::availableCores() * 2), 100), .debug = FALSE, print_summary = FALSE, sim_include = c("DRAFT", "REG", "POST") )
simulate_nfl( nfl_season = NULL, process_games = NULL, ..., playoff_seeds = ifelse(nfl_season >= 2020, 7, 6), if_ended_today = FALSE, fresh_season = FALSE, fresh_playoffs = FALSE, tiebreaker_depth = 3, test_week = NULL, simulations = 1000, sims_per_round = max(ceiling(simulations/future::availableCores() * 2), 100), .debug = FALSE, print_summary = FALSE, sim_include = c("DRAFT", "REG", "POST") )
nfl_season |
Season to simulate |
process_games |
A function to estimate and simulate the results of games. Uses team, schedule, and week number as arguments. |
... |
Additional parameters passed on to the function |
playoff_seeds |
Number of playoff teams per conference (increased in 2020 from 6 to 7). |
if_ended_today |
Either |
fresh_season |
Either |
fresh_playoffs |
Either |
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
test_week |
Aborts after the simulator reaches this week and returns the results from your process games call. |
simulations |
Equals the number of times the given NFL season shall be simulated |
sims_per_round |
The number of |
.debug |
Either |
print_summary |
If |
sim_include |
One of
|
We recommend choosing a default parallel processing method and saving it as an environment variable in the R user profile to make sure all futures will be resolved with the chosen method by default. This can be done by following the below given steps.
First, run the following line and the user profile should be opened automatically. If you haven't saved any environment variables yet, this will be an empty file.
usethis::edit_r_environ()
In the opened file add the next line, then save the file and restart your R session. Please note that this example sets "multisession" as default. For most users this should be the appropriate plan but please make sure it truly is.
R_FUTURE_PLAN="multisession"
After the session is freshly restarted please check if the above method worked
by running the next line. If the output is FALSE
you successfully set up a
default non-sequential future::plan()
. If the output is TRUE
all functions
will behave like they were called with purrr::map()
and NOT in multisession.
inherits(future::plan(), "sequential")
For more information on possible plans please see the future package Readme.
Most nflfastR functions are able to show progress updates
using progressr::progressor()
if they are turned on before the function is
called. There are at least two basic ways to do this by either activating
progress updates globally (for the current session) with
progressr::handlers(global = TRUE)
or by piping the function call into progressr::with_progress()
:
simulate_nfl(2020, fresh_season = TRUE) %>% progressr::with_progress()
For more information how to work with progress handlers please see progressr::progressr.
An nflseedR_simulation
object containing a list of 6 data frames
data frames with the results of all simulated games,
the final standings in each simulated season (incl. playoffs and draft order),
summary statistics across all simulated seasons, and the simulation parameters. For a full list,
please see the package website.
The examples on the package website
The method summary.nflseedR_simulation()
that creates a pretty html summary table.
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") try({#to avoid CRAN test problems # Simulate the season 4 times in 2 rounds sim <- nflseedR::simulate_nfl( nfl_season = 2020, fresh_season = TRUE, simulations = 4, sims_per_round = 2 ) # Overview output dplyr::glimpse(sim) })
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") try({#to avoid CRAN test problems # Simulate the season 4 times in 2 rounds sim <- nflseedR::simulate_nfl( nfl_season = 2020, fresh_season = TRUE, simulations = 4, sims_per_round = 2 ) # Overview output dplyr::glimpse(sim) })
Uses the R package gt to create a pretty html table of the nflseedR simulation summary data frame.
## S3 method for class 'nflseedR_simulation' summary(object, ...)
## S3 method for class 'nflseedR_simulation' summary(object, ...)
object |
an object for which a summary is desired. |
... |
additional arguments passed on to the methods (currently not used). |
library(nflseedR) # set seed for recreation, # internal parallelization requires a L'Ecuyer-CMRG random number generator set.seed(19980310, kind = "L'Ecuyer-CMRG") # Simulate the season 20 times in 1 round sim <- nflseedR::simulate_nfl( nfl_season = 2021, fresh_season = TRUE, simulations = 20 ) # Create Summary Tables tbl <- summary(sim) # The output of tbl is given in the above image.
library(nflseedR) # set seed for recreation, # internal parallelization requires a L'Ecuyer-CMRG random number generator set.seed(19980310, kind = "L'Ecuyer-CMRG") # Simulate the season 20 times in 1 round sim <- nflseedR::simulate_nfl( nfl_season = 2021, fresh_season = TRUE, simulations = 20 ) # Create Summary Tables tbl <- summary(sim) # The output of tbl is given in the above image.