| Title: | Simulate Disease Outbreak Line List and Contacts Data |
|---|---|
| Description: | Tools to simulate realistic raw case data for an epidemic in the form of line lists and contacts using a branching process. Simulated outbreaks are parameterised with epidemiological parameters and can have age-structured populations, age-stratified hospitalisation and death risk and time-varying case fatality risk. |
| Authors: | Joshua W. Lambert [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-5218-3046>), Carmen Tamayo Cuartero [aut] (ORCID: <https://orcid.org/0000-0003-4184-2864>), Hugo Gruson [ctb, rev] (ORCID: <https://orcid.org/0000-0002-4094-1476>), Pratik R. Gupte [ctb, rev] (ORCID: <https://orcid.org/0000-0001-5294-7819>), Adam Kucharski [rev] (ORCID: <https://orcid.org/0000-0001-8814-9421>), Chris Hartgerink [rev] (ORCID: <https://orcid.org/0000-0003-1050-6809>), Sebastian Funk [ctb] (ORCID: <https://orcid.org/0000-0002-2842-3406>), London School of Hygiene and Tropical Medicine, LSHTM [cph] (ROR: <https://ror.org/00a0jsq62>) |
| Maintainer: | Joshua W. Lambert <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.7.1.9000 |
| Built: | 2026-05-31 06:12:09 UTC |
| Source: | https://github.com/epiverse-trace/simulist |
Censor <Date> columns in line list output from sim_linelist() to a
specified time interval.
This function is similar to incidence2::incidence() but does not aggregate
events into an <incidence2> object, instead it returns the same line list
<data.frame> as input but with modified event dates.
censor_linelist( linelist, interval, reporting_artefact = c("none", "weekend_effects"), offset = min(linelist$date_onset, na.rm = TRUE) )censor_linelist( linelist, interval, reporting_artefact = c("none", "weekend_effects"), offset = min(linelist$date_onset, na.rm = TRUE) )
linelist |
Line list |
interval |
An
See details for information of the date/period objects that are returned for each interval type. |
reporting_artefact |
A |
offset |
An Default date used to start counting from for the If setting |
The line list columns that contain <Date> objects are stored at double
point precision by default. In other words, they are not integer values,
so can be part way through a day. The exact numeric value of the <Date>
can be seen if you unclass() it.
Censoring line list dates reduces the time precision (window) of the event.
Often dates of events, such as symptom onset or hospital admission are only
known to the nearest day, not hour or minute. Other events may be more
coarsely censored, for example to the nearest week or month.
censor_linelist() converts the exact double point precision event
<Date> to the time interval specified.
Depending on the interval specified, the date columns will be returned
as different objects. Here is a list of the valid input interval and the
resulting class of the date column.
integer -> <grates_period> (see grates::as_period())
"daily" -> <Date> (see Date)
"weekly" -> <grates_isoweek> (see grates::as_isoweek())
"epiweek" -> <grates_epiweek> (see grates::as_epiweek())
"monthly" -> <grates_yearmonth> (see grates::as_yearmonth())
"yearly" -> <grates_year> (see grates::as_year())
A line list <data.frame>.
set.seed(1) linelist <- sim_linelist() linelist_cens <- censor_linelist(linelist, interval = "daily") # censor to a 3-day period linelist_cens <- censor_linelist(linelist, interval = 3) # no reporting of events on weekends linelist_cens <- censor_linelist( linelist, interval = "daily", reporting_artefact = "weekend_effects" )set.seed(1) linelist <- sim_linelist() linelist_cens <- censor_linelist(linelist, interval = "daily") # censor to a 3-day period linelist_cens <- censor_linelist(linelist, interval = 3) # no reporting of events on weekends linelist_cens <- censor_linelist( linelist, interval = "daily", reporting_artefact = "weekend_effects" )
sim_linelist()
Create a list of configuration settings for some details of sim_linelist()
create_config(...)create_config(...)
... |
< Accepted arguments and their defaults are:
|
The config argument in sim_linelist() controls the small details
around time windows around infections (time of first contact and last
contact with infector), and the distribution of the Cycle threshold (Ct)
value from a Real-time PCR or quantitative PCR (qPCR) for confirmed
cases, the network effect in the simulation, and if there is a time-varying
death risk, as well as the probability of a case or contact being
male/female.
These parameters do not warrant their own arguments in
sim_linelist() as they rarely need to be changed from their default
setting. Therefore it is not worth increasing the number of sim_linelist()
arguments to accommodate these and the config argument keeps the function
signature simpler and more readable.
The last_contact_distribution and first_contact_distribution can accept
any function that generates positive integers (e.g. discrete probability
distribution, rpois() or rgeom()). The ct_distribution can accept
any function that generates real numbers (e.g. continuous or discrete
probability distribution, rnorm(), rlnorm()).
The network option controls whether to sample contacts from a adjusted or
unadjusted contact distribution. Adjusted (default) sampling uses
where is the probability
density function of a distribution, e.g., Poisson or Negative binomial.
Unadjusted (network = "unadjusted") instead samples contacts directly from
a probability distribution .
A list of settings for sim_linelist().
# example with default configuration create_config() # example with customised Ct distribution create_config( ct_distribution = function(n) rlnorm(n = n, meanlog = 2, sdlog = 1) )# example with default configuration create_config() # example with customised Ct distribution create_config( ct_distribution = function(n) rlnorm(n = n, meanlog = 2, sdlog = 1) )
Take line list output from sim_linelist() and replace elements of
the <data.frame> with missing values (e.g. NA), introduce spelling
mistakes and inconsistencies, as well as coerce date types.
messy_linelist(linelist, ...)messy_linelist(linelist, ...)
linelist |
Line list |
... |
< Accepted arguments and their defaults are:
|
By default messy_linelist():
Introduces 10% of values missing, i.e. converts to NA.
Introduces spelling mistakes in 10% of character columns.
Introduce inconsistency in the reporting of $sex.
Converts numeric columns (double & integer) to character.
Converts Date columns to character.
Converts 50% of integers to (English) words.
Duplicates 1% of rows.
Setting missing_value to something other than NA will likely cause
type coercion in the line list <data.frame> columns, most likely to
character.
When setting sex_as_numeric to TRUE, male is set to 0 and female
to 1. Only one of inconsistent_sex or sex_as_numeric can be TRUE,
otherwise the function will error.
If numeric_as_char = TRUE and sex_as_numeric = TRUE then the sex encoded
as 0 or 1 is converted to character. If prop_spelling_mistake > 0 and
numeric_as_char = TRUE the columns that are converted from numeric to
character do not have spelling mistakes introduced, because they are
numeric characters stored as character strings. If
prop_spelling_mistake > 0 and date_as_char = TRUE spelling mistakes are
not introduced into dates.
The Date columns can be converted into an inconsistent format by
setting inconsistent_dates = TRUE and it requires date_as_char = TRUE,
if the latter is FALSE the function will error.
If numeric_as_char = FALSE and prop_int_as_word > 0 then the integer
columns are converted to character string (either character numbers or
words) but the other numeric columns are not coerced. Spelling mistakes
are not introduced into integers converted to words when
prop_spelling_mistakes > 0 and prop_int_as_word > 0.
Rows are duplicated after other messy modifications so the duplicated row contains identical messy elements.
A messy line list <data.frame>.
The output <data.frame> has the same structure as the input <data.frame>
from sim_linelist(), with messy entries.
linelist <- sim_linelist() messy_linelist <- messy_linelist(linelist) # increasing proportion of missingness to 30% with a missing value of -99 messy_linelist <- messy_linelist( linelist, prop_missing = 0.3, missing_value = -99 ) # increasing proportion of spelling mistakes to 50% messy_linelist <- messy_linelist(linelist, prop_spelling_mistakes = 0.5) # encode `$sex` as `numeric` messy_linelist <- messy_linelist( linelist, sex_as_numeric = TRUE, inconsistent_sex = FALSE ) # inconsistently formatted dates messy_linelist <- messy_linelist(linelist, inconsistent_dates = TRUE)linelist <- sim_linelist() messy_linelist <- messy_linelist(linelist) # increasing proportion of missingness to 30% with a missing value of -99 messy_linelist <- messy_linelist( linelist, prop_missing = 0.3, missing_value = -99 ) # increasing proportion of spelling mistakes to 50% messy_linelist <- messy_linelist(linelist, prop_spelling_mistakes = 0.5) # encode `$sex` as `numeric` messy_linelist <- messy_linelist( linelist, sex_as_numeric = TRUE, inconsistent_sex = FALSE ) # inconsistently formatted dates messy_linelist <- messy_linelist(linelist, inconsistent_dates = TRUE)
Simulate contacts for an infectious disease outbreak
sim_contacts( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )sim_contacts( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
contact_tracing_status_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
A contacts <data.frame>.
The structure of the output is:
fromcharacter column with name of case.
tocharacter column with name of contacts of case.
ageinteger with age of infectee.
sexcharacter column with either "m" or "f" for the sex
of the contact.
date_first_contact<Date> column for the first contact between
case and contacts.
date_last_contact<Date> column for the last contact between
case and contacts.
was_caselogical boolean column with either TRUE or FALSE
for if the contact becomes a case.
statuscharacter column with the status of each contact. By
default it is either "case", "under_followup" "lost_to_followup", or
"unknown".
Joshua W. Lambert, Carmen Tamayo
# quickly simulate contact tracing data using the function defaults contacts <- sim_contacts() head(contacts) # to simulate more realistic contact tracing data load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) contacts <- sim_contacts( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5 )# quickly simulate contact tracing data using the function defaults contacts <- sim_contacts() head(contacts) # to simulate more realistic contact tracing data load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) contacts <- sim_contacts( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5 )
The line list is simulated using a branching process and parameterised with epidemiological parameters.
sim_linelist( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(n) stats::rlnorm(n = n, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(n) stats::rlnorm(n = n, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, reporting_delay = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), config = create_config() )sim_linelist( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(n) stats::rlnorm(n = n, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(n) stats::rlnorm(n = n, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, reporting_delay = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
onset_to_hosp |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_death |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If For hospitalised cases, the function ensures the onset-to-death time is
greater than the onset-to-hospitalisation time. After many (1000) attempts,
if an onset-to-death time (from |
onset_to_recovery |
A The function can be defined or anonymous. The function must return a vector
of An The default is For hospitalised cases, the function ensures the onset-to-recovery time is
greater than the onset-to-hospitalisation time. After many (1000) attempts,
if an onset-to-recovery time (from |
reporting_delay |
A The function can be defined or anonymous. The function must return a vector
of The default is |
hosp_risk |
Either a single |
hosp_death_risk |
Either a single |
non_hosp_death_risk |
Either a single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
case_type_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
For age-stratified hospitalised and death risks a <data.frame>
will need to be passed to the hosp_risk and/or hosp_death_risk
arguments. This <data.frame> should have two columns:
age_limit: a column with one numeric per cell for the lower bound
(minimum) age of the age group (inclusive).
risk: a column with one numeric per cell for the proportion
(or probability) of hospitalisation for that age group. Should be between
0 and 1.
For an age-structured population, a <data.frame> with two columns:
age_limit: a column with one numeric per cell for the lower bound
(minimum) age of the age group (inclusive), except the last element which is
the upper bound (maximum) of the population.
proportion: a column with the proportion of the population that are in
that age group. Proportions must sum to one.
A line list <data.frame>
The structure of the output is:
case_namecharacter column with name of case.
case_typecharacter column with type of case. By default it is
either "confirmed", "probable", or "suspected".
sexcharacter column with either "m" or "f" for the sex
of the case.
ageinteger with age of case.
date_onset<Date> column for date of symptom onset.
date_reporting<Date> column for the date of reporting
(i.e. entry into line list).
date_admission<Date> column for date of hospital admission.
outcomecharacter column with the outcome status of each case.
Either "recovered" or "died".
date_outcome<Date> column for the date of outcome.
date_first_contact<Date> column for the first contact between
infector and infectee (case).
date_last_contact<Date> column for the last contact between
infector and infectee (case).
ct_valuenumeric column with the Cycle threshold (Ct) value
from qPCR for confirmed cases.
Joshua W. Lambert, Carmen Tamayo
# quickly simulate a line list using the function defaults linelist <- sim_linelist() head(linelist) # to simulate a more realistic line list load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) onset_to_hosp <- epiparameter( disease = "COVID-19", epi_name = "onset to hospitalisation", prob_distribution = create_prob_distribution( prob_distribution = "lnorm", prob_distribution_params = c(meanlog = 1, sdlog = 0.5) ) ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) # example with single hospitalisation risk for entire population linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = 0.5 ) head(linelist) # example with age-stratified hospitalisation risk # 20% for over 80s # 10% for under 5s # 5% for the rest age_dep_hosp_risk <- data.frame( age_limit = c(0, 5, 80), risk = c(0.1, 0.05, 0.2) ) linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = age_dep_hosp_risk ) head(linelist)# quickly simulate a line list using the function defaults linelist <- sim_linelist() head(linelist) # to simulate a more realistic line list load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) onset_to_hosp <- epiparameter( disease = "COVID-19", epi_name = "onset to hospitalisation", prob_distribution = create_prob_distribution( prob_distribution = "lnorm", prob_distribution_params = c(meanlog = 1, sdlog = 0.5) ) ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) # example with single hospitalisation risk for entire population linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = 0.5 ) head(linelist) # example with age-stratified hospitalisation risk # 20% for over 80s # 10% for under 5s # 5% for the rest age_dep_hosp_risk <- data.frame( age_limit = c(0, 5, 80), risk = c(0.1, 0.05, 0.2) ) linelist <- sim_linelist( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death, hosp_risk = age_dep_hosp_risk ) head(linelist)
The line list and contacts are simulated using a branching process and parameterised with epidemiological parameters.
sim_outbreak( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(n) stats::rlnorm(n = n, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(n) stats::rlnorm(n = n, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, reporting_delay = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )sim_outbreak( contact_distribution = function(x) stats::dpois(x = x, lambda = 2), infectious_period = function(n) stats::rlnorm(n = n, meanlog = 2, sdlog = 0.5), prob_infection = 0.5, onset_to_hosp = function(n) stats::rlnorm(n = n, meanlog = 1.5, sdlog = 0.5), onset_to_death = function(n) stats::rlnorm(n = n, meanlog = 2.5, sdlog = 0.5), onset_to_recovery = NULL, reporting_delay = NULL, hosp_risk = 0.2, hosp_death_risk = 0.5, non_hosp_death_risk = 0.05, outbreak_start_date = as.Date("2023-01-01"), anonymise = FALSE, outbreak_size = c(10, 10000), population_age = c(0, 90), case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5), contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown = 0.1), config = create_config() )
contact_distribution |
A The function can be defined or anonymous. The function must have a single
argument in the form of an An The default is an anonymous function with a Poisson probability mass function
( |
infectious_period |
A The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate. An The default is an anonymous function with a lognormal distribution random
number generator ( |
prob_infection |
A single |
onset_to_hosp |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If |
onset_to_death |
A The function can be defined or anonymous. The function must return a vector
of An The default is an anonymous function with a lognormal distribution random
number generator ( If For hospitalised cases, the function ensures the onset-to-death time is
greater than the onset-to-hospitalisation time. After many (1000) attempts,
if an onset-to-death time (from |
onset_to_recovery |
A The function can be defined or anonymous. The function must return a vector
of An The default is For hospitalised cases, the function ensures the onset-to-recovery time is
greater than the onset-to-hospitalisation time. After many (1000) attempts,
if an onset-to-recovery time (from |
reporting_delay |
A The function can be defined or anonymous. The function must return a vector
of The default is |
hosp_risk |
Either a single |
hosp_death_risk |
Either a single |
non_hosp_death_risk |
Either a single |
outbreak_start_date |
A |
anonymise |
A |
outbreak_size |
A |
population_age |
Either a |
case_type_probs |
A named |
contact_tracing_status_probs |
A named |
config |
A list of settings to adjust the randomly sampled delays and
Ct values. See |
For age-stratified hospitalised and death risks a <data.frame>
will need to be passed to the hosp_risk and/or hosp_death_risk
arguments. This <data.frame> should have two columns:
age_limit: a column with one numeric per cell for the lower bound
(minimum) age of the age group (inclusive).
risk: a column with one numeric per cell for the proportion
(or probability) of hospitalisation for that age group. Should be between
0 and 1.
For an age-structured population, a <data.frame> with two columns:
age_limit: a column with one numeric per cell for the lower bound
(minimum) age of the age group (inclusive), except the last element which is
the upper bound (maximum) of the population.
proportion: a column with the proportion of the population that are in
that age group. Proportions must sum to one.
A list with two elements:
A line list <data.frame> (see sim_linelist() for <data.frame>
structure)
A contacts <data.frame> (see sim_contacts() for <data.frame>
structure)
Joshua W. Lambert
# quickly simulate an outbreak using the function defaults outbreak <- sim_outbreak() head(outbreak$linelist) head(outbreak$contacts) # to simulate a more realistic outbreak load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) onset_to_hosp <- epiparameter( disease = "COVID-19", epi_name = "onset to hospitalisation", prob_distribution = create_prob_distribution( prob_distribution = "lnorm", prob_distribution_params = c(meanlog = 1, sdlog = 0.5) ) ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) outbreak <- sim_outbreak( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death )# quickly simulate an outbreak using the function defaults outbreak <- sim_outbreak() head(outbreak$linelist) head(outbreak$contacts) # to simulate a more realistic outbreak load epiparameters from # {epiparameter} library(epiparameter) contact_distribution <- epiparameter( disease = "COVID-19", epi_name = "contact distribution", prob_distribution = create_prob_distribution( prob_distribution = "pois", prob_distribution_params = c(mean = 2) ) ) infectious_period <- epiparameter( disease = "COVID-19", epi_name = "infectious period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 1, scale = 1) ) ) onset_to_hosp <- epiparameter( disease = "COVID-19", epi_name = "onset to hospitalisation", prob_distribution = create_prob_distribution( prob_distribution = "lnorm", prob_distribution_params = c(meanlog = 1, sdlog = 0.5) ) ) # get onset to death from {epiparameter} database onset_to_death <- epiparameter_db( disease = "COVID-19", epi_name = "onset to death", single_epiparameter = TRUE ) outbreak <- sim_outbreak( contact_distribution = contact_distribution, infectious_period = infectious_period, prob_infection = 0.5, onset_to_hosp = onset_to_hosp, onset_to_death = onset_to_death )
Adjust or subset the line list <data.frame> by removing cases that
have not been reported by the truncation time and setting hospitalisation
admission or outcome dates that are after the truncation point to NA.
This is to replicate real-time outbreak data where recent cases or outcomes are not yet observed or reported (right truncation). It implies an assumption that symptom onsets are reported with a delay but hospitalisations are reported instantly.
truncate_linelist( linelist, truncation_day = 14, unit = c("days", "weeks", "months", "years"), direction = c("backwards", "forwards") )truncate_linelist( linelist, truncation_day = 14, unit = c("days", "weeks", "months", "years"), direction = c("backwards", "forwards") )
linelist |
Line list |
truncation_day |
A single Alternatively, |
unit |
A Years are assumed to be 365.25 days and months are assumed to be 365.25 / 12 days (same as lubridate). |
direction |
A |
The day on which the line list is truncated is the same for
all individuals in the line list, and is specified by the
truncation_day and unit arguments.
A line list <data.frame>.
The output <data.frame> has the same structure as the input <data.frame>
from sim_linelist(), but can be a subset and dates after truncation set
to NA.
set.seed(1) linelist <- sim_linelist() linelist_trunc <- truncate_linelist(linelist) # set truncation point 3 weeks before the end of outbreak linelist_trunc <- truncate_linelist( linelist, truncation_day = 3, unit = "weeks" ) # set truncation point to 2 months since the start of outbreak linelist_trunc <- truncate_linelist( linelist, truncation_day = 2, unit = "months", direction = "forwards" ) # set truncation point to 2023-03-01 linelist_trunc <- truncate_linelist( linelist, truncation_day = as.Date("2023-03-01") )set.seed(1) linelist <- sim_linelist() linelist_trunc <- truncate_linelist(linelist) # set truncation point 3 weeks before the end of outbreak linelist_trunc <- truncate_linelist( linelist, truncation_day = 3, unit = "weeks" ) # set truncation point to 2 months since the start of outbreak linelist_trunc <- truncate_linelist( linelist, truncation_day = 2, unit = "months", direction = "forwards" ) # set truncation point to 2023-03-01 linelist_trunc <- truncate_linelist( linelist, truncation_day = as.Date("2023-03-01") )