Title: | Simulate Disease Outbreak Line List and Contacts Data |
---|---|
Description: | Tools to simulate realistic raw case data for an epidemic in the form of line lists and contacts using a branching process. Simulated outbreaks are parameterised with epidemiological parameters and can have age structured populations, age-stratified hospitalisation and death risk and time-varying case fatality risk. |
Authors: | Joshua W. Lambert [aut, cre, cph] , Carmen Tamayo [aut] , Hugo Gruson [ctb, rev] , Pratik R. Gupte [ctb, rev] , Adam Kucharski [rev] , Chris Hartgerink [rev] |
Maintainer: | Joshua W. Lambert <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.0.9000 |
Built: | 2025-01-10 15:21:52 UTC |
Source: | https://github.com/epiverse-trace/simulist |
sim_linelist()
Create a list of configuration settings for some details of sim_linelist()
create_config(...)
create_config(...)
... |
< |
The config
argument in sim_linelist()
controls the small details
around time windows around infections (time of first contact and last
contact with infector), and the distribution of the Cycle threshold (Ct)
value from a Real-time PCR or quantitative PCR (qPCR) for confirmed
cases, the network effect in the simulation, and if there is a time-varying
death risk.
Accepted arguments and their defaults are:
last_contact_distribution = "pois"
last_contact_distribution_params = c(lambda = 3)
first_contact_distribution = "pois"
first_contact_distribution_params = c(lambda = 3)
ct_distribution = "norm"
ct_distribution_params = c(mean = 25, sd = 2)
network = "adjusted"
time_varying_death_risk = NULL
These parameters do not warrant their own arguments in
sim_linelist()
as they rarely need to be changed from their default
setting. Therefore it is not worth increasing the number of sim_linelist()
arguments to accommodate these and the config
argument keeps the function
signature simpler and more readable.
The accepted distributions are:
last_contact_distribution = c("pois", "geom")
first_contact_distribution = c("pois", "geom")
ct_distribution = c("norm", "lnorm")
The network
option controls whether to sample contacts from a adjusted or
unadjusted contact distribution. Adjusted (default) sampling uses
where
is the probability
density function of a distribution, e.g., Poisson or Negative binomial.
Unadjusted (
network = "unadjusted"
) instead samples contacts directly from
a probability distribution .
A list of settings for sim_linelist()
# example with default configuration create_config() # example with customised Ct distribution create_config( ct_distribution = "lnorm", ct_distribution_params = c(meanlog = 2, sdlog = 1) )
# example with default configuration create_config() # example with customised Ct distribution create_config( ct_distribution = "lnorm", ct_distribution_params = c(meanlog = 2, sdlog = 1) )
Simulate contacts for an infectious disease outbreak
sim_contacts( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
sim_contacts( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
contact_tracing_status_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
A contacts <data.frame>
Joshua W. Lambert, Carmen Tamayo
# quickly simulate contact tracing data using the function defaults contacts <- sim_contacts() head(contacts) # to simulate more realistic contact tracing data load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) contacts <- sim_contacts( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5 )
# quickly simulate contact tracing data using the function defaults contacts <- sim_contacts() head(contacts) # to simulate more realistic contact tracing data load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) contacts <- sim_contacts( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5 )
The line list is simulated using a branching process and parameterised with epidemiological parameters.
sim_linelist( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), config = create_config() )
sim_linelist( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
onset_to_hosp |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_death |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_recovery |
A The function can be defined or anonymous. The function must return a vector
of An The default is |
hosp_risk |
Either a single |
hosp_death_risk |
Either a single |
non_hosp_death_risk |
Either a single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
case_type_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
For age-stratified hospitalised and death risks a <data.frame>
will need to be passed to the hosp_risk
and/or hosp_death_risk
arguments. This <data.frame>
should have two columns:
age_limit
: a column with one numeric
per cell for the lower bound
(minimum) age of the age group (inclusive).
risk
: a column with one numeric
per cell for the proportion
(or probability) of hospitalisation for that age group. Should be between
0 and 1.
For an age structured population, a <data.frame>
with two columns:
age_range
: a column with characters specifying the lower and upper bound
of that age group, separated by a hyphen (-). Both bounds are inclusive
(integers). For example, an age group of one to ten would be given as
"1-10"
.
proportion
: a column with the proportion of the population that are in
that age group. Proportions must sum to one.
A line list <data.frame>
Joshua W. Lambert, Carmen Tamayo
# quickly simulate a line list using the function defaults linelist <- sim_linelist() head(linelist) # to simulate a more realistic line list load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) # get onset to hospital admission from {epiparameter} database onset_to_hosp <- epiparameter_db( disease = "COVID-19", epi_name = "onset to hospitalisation", single_epiparameter = TRUE ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) # example with single hospitalisation risk for entire population linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = 0.5 ) head(linelist) # example with age-stratified hospitalisation risk # 20% for over 80s # 10% for under 5s # 5% for the rest age_dep_hosp_risk <- data.frame( age_limit = c(1, 5, 80), risk = c(0.1, 0.05, 0.2) ) linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = age_dep_hosp_risk ) head(linelist)
# quickly simulate a line list using the function defaults linelist <- sim_linelist() head(linelist) # to simulate a more realistic line list load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) # get onset to hospital admission from {epiparameter} database onset_to_hosp <- epiparameter_db( disease = "COVID-19", epi_name = "onset to hospitalisation", single_epiparameter = TRUE ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) # example with single hospitalisation risk for entire population linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = 0.5 ) head(linelist) # example with age-stratified hospitalisation risk # 20% for over 80s # 10% for under 5s # 5% for the rest age_dep_hosp_risk <- data.frame( age_limit = c(1, 5, 80), risk = c(0.1, 0.05, 0.2) ) linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = age_dep_hosp_risk ) head(linelist)
The line list and contacts are simulated using a branching process and parameterised with epidemiological parameters.
sim_outbreak( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
sim_outbreak( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(1, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
onset_to_hosp |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_death |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_recovery |
A The function can be defined or anonymous. The function must return a vector
of An The default is |
hosp_risk |
Either a single |
hosp_death_risk |
Either a single |
non_hosp_death_risk |
Either a single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
case_type_probs |
A named |
contact_tracing_status_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
For age-stratified hospitalised and death risks a <data.frame>
will need to be passed to the hosp_risk
and/or hosp_death_risk
arguments. This <data.frame>
should have two columns:
age_limit
: a column with one numeric
per cell for the lower bound
(minimum) age of the age group (inclusive).
risk
: a column with one numeric
per cell for the proportion
(or probability) of hospitalisation for that age group. Should be between
0 and 1.
For an age structured population, a <data.frame>
with two columns:
age_range
: a column with characters specifying the lower and upper bound
of that age group, separated by a hyphen (-). Both bounds are inclusive
(integers). For example, an age group of one to ten would be given as
"1-10"
.
proportion
: a column with the proportion of the population that are in
that age group. Proportions must sum to one.
A list with two elements:
A line list <data.frame>
A contacts <data.frame>
Joshua W. Lambert
# quickly simulate an outbreak using the function defaults outbreak <- sim_outbreak() head(outbreak$linelist) head(outbreak$contacts) # to simulate a more realistic outbreak load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) # get onset to hospital admission from {epiparameter} database onset_to_hosp <- epiparameter_db( disease = "COVID-19", epi_name = "onset to hospitalisation", single_epiparameter = TRUE ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) outbreak <- sim_outbreak( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death )
# quickly simulate an outbreak using the function defaults outbreak <- sim_outbreak() head(outbreak$linelist) head(outbreak$contacts) # to simulate a more realistic outbreak load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) # get onset to hospital admission from {epiparameter} database onset_to_hosp <- epiparameter_db( disease = "COVID-19", epi_name = "onset to hospitalisation", single_epiparameter = TRUE ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) outbreak <- sim_outbreak( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death )