Title: | Functions to Efficiently Simulate and Evaluate NFL Seasons |
---|---|
Description: | A set of functions to simulate National Football League seasons including the sophisticated tie-breaking procedures. |
Authors: | Sebastian Carl [cre, aut, cph], Lee Sharpe [aut] |
Maintainer: | Sebastian Carl <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0.9903 |
Built: | 2025-03-11 14:20:51 UTC |
Source: | https://github.com/nflverse/nflseedR |
Compute NFL Playoff Seedings using Game Results and Divisional Rankings
compute_conference_seeds( teams, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE, playoff_seeds = 7 )
compute_conference_seeds( teams, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE, playoff_seeds = 7 )
teams |
The division standings data frame as computed by
|
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
playoff_seeds |
Number of playoff teams per conference (increased in 2020 from 6 to 7). |
A data frame of division standings including playoff seeds and the
week in which the season ended for the respective team (exit
).
A list of two data frames:
Division standings including playoff seeds.
A data frame that is used for head-to-head tiebreakers across the tie-breaking functions.
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h) %>% purrr::pluck("standings") }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h) %>% purrr::pluck("standings") }) # Restore old options options(old)
Compute NFL Division Rankings using Game Results
compute_division_ranks( games, teams = NULL, tiebreaker_depth = 3, .debug = FALSE, h2h = NULL )
compute_division_ranks( games, teams = NULL, tiebreaker_depth = 3, .debug = FALSE, h2h = NULL )
games |
A data frame containing real or simulated game scores. The following variables are required:
|
teams |
This parameter is optional. If it is
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
A list of two data frames:
Division standings.
A data frame that is used for head-to-head tiebreakers across the tie-breaking functions.
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% purrr::pluck("standings") }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2019:2020) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) %>% nflseedR::compute_division_ranks() %>% purrr::pluck("standings") }) # Restore old options options(old)
Compute NFL Draft Order using Game Results and Divisional Rankings
compute_draft_order( teams, games, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE )
compute_draft_order( teams, games, h2h = NULL, tiebreaker_depth = 3, .debug = FALSE )
teams |
The division standings data frame including playoff seeds as
computed by |
games |
A data frame containing real or simulated game scores. The following variables are required:
|
h2h |
A data frame that is used for head-to-head tiebreakers across the
tie-breaking functions. It is computed by the function
|
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
.debug |
Either |
A data frame of standings including the final draft pick number and
the variable exit
which indicates the week number of the teams final
game (Super Bowl Winner is one week higher).
The examples on the package website
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems games <- nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2018:2019) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) games %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h, playoff_seeds = 6) %>% nflseedR::compute_draft_order(games = games, h2h = .$h2h) }) # Restore old options options(old)
# Change some options for better output old <- options(list(digits = 3, tibble.print_min = 64)) library(dplyr, warn.conflicts = FALSE) try({#to avoid CRAN test problems games <- nflseedR::load_sharpe_games() %>% dplyr::filter(season %in% 2018:2019) %>% dplyr::select(sim = season, game_type, week, away_team, home_team, result) games %>% nflseedR::compute_division_ranks() %>% nflseedR::compute_conference_seeds(h2h = .$h2h, playoff_seeds = 6) %>% nflseedR::compute_draft_order(games = games, h2h = .$h2h) }) # Restore old options options(old)
A dataframe containing the data dictionary of the simulation output table "game_summary"
dictionary_game_summary
dictionary_game_summary
An object of class data.frame
with 11 rows and 2 columns.
https://nflseedr.com/articles/articles/nflsim2.html#simulation-output
A dataframe containing the data dictionary of the simulation output table "games"
dictionary_games
dictionary_games
An object of class data.frame
with 9 rows and 2 columns.
https://nflseedr.com/articles/articles/nflsim2.html#simulation-output
A dataframe containing the data dictionary of the simulation output table "overall"
dictionary_overall
dictionary_overall
An object of class data.frame
with 11 rows and 2 columns.
https://nflseedr.com/articles/articles/nflsim2.html#simulation-output
A dataframe containing the data dictionary of the simulation output table "standings"
dictionary_standings
dictionary_standings
An object of class data.frame
with 21 rows and 2 columns.
https://nflseedr.com/articles/articles/nflsim2.html#simulation-output
A dataframe containing the data dictionary of the simulation output table "team_wins"
dictionary_team_wins
dictionary_team_wins
An object of class data.frame
with 4 rows and 2 columns.
https://nflseedr.com/articles/articles/nflsim2.html#simulation-output
NFL team names and the conferences and divisions they belong to
divisions
divisions
A data frame with 36 rows and 4 variables containing NFL team level information, including franchises in multiple cities:
Team abbreviation
Conference abbreviation
Division name
Division abbreviation
This data frame is created using the teams_colors_logos
data frame of the
nflfastR
package. Please see data-raw/divisions.R
for the code to create
this data.
str(divisions)
str(divisions)
This function formats numeric vectors with values between 0 and 1 into percentage strings with special specifications. Those specifications are:
0 and 1 are converted to "0%" and "100%" respectively (takes machine precision into account)
all other values < 0.01 are converted to "<1%"
all other values between 0.01 and 0.995 are rounded to percentages without decimals
values between 0.995 and 0.999 are rounded to percentages with 1 decimal
values between 0.999 and 1 are converted to ">99.9%" unless closer to 1 than machine precision.
fmt_pct_special(x)
fmt_pct_special(x)
x |
A vector of numerical values |
A character vector
x <- c(0, 0.004, 0.009, 0.011, 0.9, 0.98, 0.994, .995, .9989, .999, .9991, .99999999) fmt <- fmt_pct_special(x) data.frame(x = x, fmt = fmt)
x <- c(0, 0.004, 0.009, 0.011, 0.9, 0.98, 0.994, .995, .9989, .999, .9991, .99999999) fmt <- fmt_pct_special(x) data.frame(x = x, fmt = fmt)
Simulate NFL games based on a user provided games/schedule object that
holds matchups with and without results. Missing results are computed using
the argument compute_results
and possible further arguments to
compute_results
in ...
(please see simulations_verify_fct for
further information.).
It is possible to let the function calculate playoff participants and simulate the post-season. The code is also developed for maximum performance and allows parallel computation by splitting the number of simulations into chunks and calling the appropriate future::plan. Progress updates can be activated by calling progressr::handlers before the start of the simulations. Please see the below given section "Details" for further information.
nfl_simulations( games, compute_results = nflseedR_compute_results, ..., playoff_seeds = 7L, simulations = 10000L, chunks = 8L, byes_per_conf = 1L, tiebreaker_depth = c("SOS", "PRE-SOV", "RANDOM"), sim_include = c("DRAFT", "REG", "POST"), verbosity = c("MIN", "MAX", "NONE") )
nfl_simulations( games, compute_results = nflseedR_compute_results, ..., playoff_seeds = 7L, simulations = 10000L, chunks = 8L, byes_per_conf = 1L, tiebreaker_depth = c("SOS", "PRE-SOV", "RANDOM"), sim_include = c("DRAFT", "REG", "POST"), verbosity = c("MIN", "MAX", "NONE") )
games |
A data frame containing real or simulated game scores. Outside of simulations, this is simply the output of nflreadr::load_schedules. The following variables are required as a minimum:
If tiebreakers beyond SOS are to be used, then the actual scores of the
home ( |
compute_results |
Defaults to the nflseedR function |
... |
Additional parameters passed on to the function |
playoff_seeds |
If |
simulations |
Equals the number of times the given NFL season shall be simulated |
chunks |
The number of chunks |
byes_per_conf |
The number of teams with a playoff bye week per conference. This number influences the number of wildcard games that are simulated. |
tiebreaker_depth |
One of
|
sim_include |
One of
|
verbosity |
One of
|
We recommend choosing a default parallel processing method and saving it as an environment variable in the R user profile to make sure all futures will be resolved with the chosen method by default. This can be done by following the below given steps.
First, run the below line and the user profile should be opened automatically. If you haven't saved any environment variables yet, this will be an empty file.
usethis::edit_r_environ()
In the opened file add the next line, then save the file and restart your R session. Please note that this example sets "multisession" as default. For most users this should be the appropriate plan but please make sure it truly is.
R_FUTURE_PLAN="multisession"
After the session is freshly restarted please check if the above method worked
by running the next line. If the output is FALSE
you successfully set up a
default non-sequential future::plan()
. If the output is TRUE
all functions
will behave like they were called with purrr::map()
and NOT in multisession.
inherits(future::plan(), "sequential")
For more information on possible plans please see the future package Readme.
nflseedR is able to show progress updates
using progressr::progressor()
if they are turned on before the function is
called. There are at least two basic ways to do this by either activating
progress updates globally (for the current session) with
progressr::handlers(global = TRUE)
or by piping the function call into progressr::with_progress()
:
nflseedR::nfl_simulations( games = nflseedR::sims_games_example, simulations = 4, chunks = 2 ) |> progressr::with_progress()
For more information how to work with progress handlers please see progressr::progressr.
It is to be expected that some form of random number generation is required
in the function in argument compute_results
.
For better performance, nflseedR uses the furrr package to parallelize chunks.
furrr functions are guaranteed to generate the exact same sequence of random
numbers given the same initial seed if, and only if, the initial seed is of
the type "L'Ecuyer-CMRG".
So if you want a consistent seed to be used across all chunks, you must ensure
that the correct type is specified in set.seed
, e.g. with the following code
set.seed(5, "L'Ecuyer-CMRG")
It is sufficient to set the seed before nfl_simulations is called. To check that the type has been set correctly, you can use the following code.
RNGkind() "L'Ecuyer-CMRG" "Inversion" "Rejection" # Should be a integer vector of length 7 .Random.seed 10407 1157214768 -1674567567 -1532971138 -1249749529 1302496508 -253670963
For more information, please see the section "Reproducible random number generation (RNG)" in furrr::furrr_options.
An nflseedR_simulation
object containing a list of 6
data frames with the results of all simulated games,
the final standings in each simulated season,
summary statistics across all simulated seasons, and the simulation parameters. For a full list,
please see the package website.
The examples on the package website
The method summary.nflseedR_simulation()
that creates a pretty html summary table.
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") sim <- nflseedR::nfl_simulations( games = nflseedR::sims_games_example, simulations = 4, chunks = 2 ) # Overview output str(sim, max.level = 3)
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") sim <- nflseedR::nfl_simulations( games = nflseedR::sims_games_example, simulations = 4, chunks = 2 ) # Overview output str(sim, max.level = 3)
Compute NFL Standings
nfl_standings( games, ..., ranks = c("CONF", "DIV", "DRAFT", "NONE"), tiebreaker_depth = c("SOS", "PRE-SOV", "POINTS", "RANDOM"), playoff_seeds = NULL, verbosity = c("MIN", "MAX", "NONE") )
nfl_standings( games, ..., ranks = c("CONF", "DIV", "DRAFT", "NONE"), tiebreaker_depth = c("SOS", "PRE-SOV", "POINTS", "RANDOM"), playoff_seeds = NULL, verbosity = c("MIN", "MAX", "NONE") )
games |
A data frame containing real or simulated game scores. Outside of simulations, this is simply the output of nflreadr::load_schedules. The following variables are required as a minimum:
If tiebreakers beyond SOS are to be used, then the actual scores of the
home ( |
... |
currently not used |
ranks |
One of
|
tiebreaker_depth |
One of
|
playoff_seeds |
If |
verbosity |
One of
|
nflseedR does not support all levels of tie-breakers at the moment. The deepest tie-breaker currently is "best net points in all games". After that, the decision is made at random. However, the need for the last level ("best net touchdowns in all games") is extremely unlikely in practice. Deeper levels than strength of schedule have never actually been needed to resolve season-end standings since the NFL expanded to 32 teams.
A data.table of NFL standings including the ranks selected in the
argument ranks
For more information on the implemented tiebreakers, see https://nflseedr.com/articles/tiebreaker.html
try({#to avoid CRAN test problems games <- nflreadr::load_schedules(2021:2022) }) standings <- nflseedR::nfl_standings(games) print(standings, digits = 3)
try({#to avoid CRAN test problems games <- nflreadr::load_schedules(2021:2022) }) standings <- nflseedR::nfl_standings(games) print(standings, digits = 3)
Example Games Data used in NFL Simulations
sims_games_example
sims_games_example
A data frame with 284 rows and 9 variables containing NFL schedule information.
Please see data-raw/sim_examples.R
for the code to create this data.
str(sims_games_example)
str(sims_games_example)
Example Teams Data used in NFL Simulations
sims_teams_example
sims_teams_example
A data frame with 64 rows and 5 variables containing team name and division information.
Please see data-raw/sim_examples.R
for the code to create this data.
str(sims_teams_example)
str(sims_teams_example)
This function simulates a given NFL season multiple times using custom functions to estimate and simulate game results and computes the outcome of the given season including playoffs and draft order. It is possible to run the function in parallel processes by calling the appropriate plan. Progress updates can be activated by calling handlers before the start of the simulations. Please see the below given section "Details" for further information.
simulate_nfl( nfl_season = NULL, process_games = NULL, ..., playoff_seeds = ifelse(nfl_season >= 2020, 7, 6), if_ended_today = FALSE, fresh_season = FALSE, fresh_playoffs = FALSE, tiebreaker_depth = 3, test_week = NULL, simulations = 1000, sims_per_round = max(ceiling(simulations/future::availableCores() * 2), 100), .debug = FALSE, print_summary = FALSE, sim_include = c("DRAFT", "REG", "POST") )
simulate_nfl( nfl_season = NULL, process_games = NULL, ..., playoff_seeds = ifelse(nfl_season >= 2020, 7, 6), if_ended_today = FALSE, fresh_season = FALSE, fresh_playoffs = FALSE, tiebreaker_depth = 3, test_week = NULL, simulations = 1000, sims_per_round = max(ceiling(simulations/future::availableCores() * 2), 100), .debug = FALSE, print_summary = FALSE, sim_include = c("DRAFT", "REG", "POST") )
nfl_season |
Season to simulate |
process_games |
A function to estimate and simulate the results of games. Uses team, schedule, and week number as arguments. |
... |
Additional parameters passed on to the function |
playoff_seeds |
Number of playoff teams per conference (increased in 2020 from 6 to 7). |
if_ended_today |
Either |
fresh_season |
Either |
fresh_playoffs |
Either |
tiebreaker_depth |
A single value equal to 1, 2, or 3. The default is 3. The value controls the depth of tiebreakers that shall be applied. The deepest currently implemented tiebreaker is strength of schedule. The following values are valid:
|
test_week |
Aborts after the simulator reaches this week and returns the results from your process games call. |
simulations |
Equals the number of times the given NFL season shall be simulated |
sims_per_round |
The number of |
.debug |
Either |
print_summary |
If |
sim_include |
One of
|
We recommend choosing a default parallel processing method and saving it as an environment variable in the R user profile to make sure all futures will be resolved with the chosen method by default. This can be done by following the below given steps.
First, run the following line and the user profile should be opened automatically. If you haven't saved any environment variables yet, this will be an empty file.
usethis::edit_r_environ()
In the opened file add the next line, then save the file and restart your R session. Please note that this example sets "multisession" as default. For most users this should be the appropriate plan but please make sure it truly is.
R_FUTURE_PLAN="multisession"
After the session is freshly restarted please check if the above method worked
by running the next line. If the output is FALSE
you successfully set up a
default non-sequential future::plan()
. If the output is TRUE
all functions
will behave like they were called with purrr::map()
and NOT in multisession.
inherits(future::plan(), "sequential")
For more information on possible plans please see the future package Readme.
Most nflfastR functions are able to show progress updates
using progressr::progressor()
if they are turned on before the function is
called. There are at least two basic ways to do this by either activating
progress updates globally (for the current session) with
progressr::handlers(global = TRUE)
or by piping the function call into progressr::with_progress()
:
simulate_nfl(2020, fresh_season = TRUE) %>% progressr::with_progress()
For more information how to work with progress handlers please see progressr::progressr.
An nflseedR_simulation
object containing a list of 6 data frames
data frames with the results of all simulated games,
the final standings in each simulated season (incl. playoffs and draft order),
summary statistics across all simulated seasons, and the simulation parameters. For a full list,
please see the package website.
The examples on the package website
The method summary.nflseedR_simulation()
that creates a pretty html summary table.
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") try({#to avoid CRAN test problems # Simulate the season 4 times in 2 rounds sim <- nflseedR::simulate_nfl( nfl_season = 2020, fresh_season = TRUE, simulations = 4, sims_per_round = 2 ) # Overview output dplyr::glimpse(sim) })
library(nflseedR) # Activate progress updates # progressr::handlers(global = TRUE) # Parallel processing can be activated via the following line # future::plan("multisession") try({#to avoid CRAN test problems # Simulate the season 4 times in 2 rounds sim <- nflseedR::simulate_nfl( nfl_season = 2020, fresh_season = TRUE, simulations = 4, sims_per_round = 2 ) # Overview output dplyr::glimpse(sim) })
nflseedR supports custom functions to compute results in season simulations
through the argument compute_results
in the season simulation function
nfl_simulations. To ensure that custom functions work as nflseedR expects
them to, it is recommended to verify their behavior.
This function first checks the structure of the output and then whether game
results are changed as expected. Whenever a problem is found, the function will
error with a hint to the problem (this means that you might be required to
iterate over all problems until the function stops erroring).
See below detail section for more information on expected behavior.
simulations_verify_fct( compute_results, ..., games = nflseedR::sims_games_example, teams = nflseedR::sims_teams_example )
simulations_verify_fct( compute_results, ..., games = nflseedR::sims_games_example, teams = nflseedR::sims_teams_example )
compute_results |
A function to compute results of games. See below detail section for more information on expected behavior. |
... |
Further arguments passed on to |
games |
An NFL schedule where some results are missing. |
teams |
A list of teams by simulation number. This is usually calculated automatically and not user facing. It can be used to "transport" team information like elo ratings from one simulated week to the next. Defaults to sims_teams_example. Please see this example to understand the required data structure. |
The following sections detail the requirements for the compute_results
function. If anything is unclear, please see the source code of nflseedR's
default function nflseedR_compute_results
.
compute_results
The function passed to compute_results
is required to support the arguments
"teams"
, "games"
, and "week_num"
. The two leading ones are already
described above. The latter is a factor with a length of 1, which identifies
the current week. Regular season weeks are labeled "1"
, "2"
, etc.
Playoff weeks are labeled "WC"
, "DIV"
, "CON"
, and "SB"
.
compute_results
The function passed to compute_results
is required to return a list of the
two objects "teams"
and "games"
as passed to it in the arguments of the
same name. The function must not remove rows or columns.
So the last line of compute_results
usually looks like
list("teams" = teams, "games" = games)
compute_results
when Computing Game ResultsnflseedR calls compute_results
for every week where a result
is missing
in games
. The variable result
is defined as the point differential between
the home team and the away team. If the home team loses, the value is
therefore < 0, if it wins > 0 and if it ties == 0.
To support elo-based simulations, this is done in a loop so that elo ratings
can be updated based on the results and "transported" from week to week. You
can "transport" ratings or other information by joining them to the "teams"
table.
This behavior requires that compute_results
only changes the results of
the current week - called week_num
. And only if there is not already a
result.
So compute_results
must only compute a result when
week == week_num & is.na(result)
For the playoffs, there is also the special case that matches cannot end in a
tie (result == 0
). In most cases, ties are not simulated anyway because
they occur so rarely. But in the event that they are simulated, they must
not be in the playoffs.
Returns TRUE
invisibly if no problems are found.
simulations_verify_fct(nflseedR_compute_results)
simulations_verify_fct(nflseedR_compute_results)
Uses the R package gt to create a pretty html table of the nflseedR simulation summary data frame.
## S3 method for class 'nflseedR_simulation' summary(object, ...)
## S3 method for class 'nflseedR_simulation' summary(object, ...)
object |
an object for which a summary is desired. |
... |
additional arguments passed on to the methods (currently not used). |
library(nflseedR) # set seed for recreation, # internal parallelization requires a L'Ecuyer-CMRG random number generator set.seed(19980310, kind = "L'Ecuyer-CMRG") # Simulate the season 20 times in 1 round sim <- nflseedR::simulate_nfl( nfl_season = 2021, fresh_season = TRUE, simulations = 20 ) # Create Summary Tables tbl <- summary(sim) # The output of tbl is given in the above image.
library(nflseedR) # set seed for recreation, # internal parallelization requires a L'Ecuyer-CMRG random number generator set.seed(19980310, kind = "L'Ecuyer-CMRG") # Simulate the season 20 times in 1 round sim <- nflseedR::simulate_nfl( nfl_season = 2021, fresh_season = TRUE, simulations = 20 ) # Create Summary Tables tbl <- summary(sim) # The output of tbl is given in the above image.