--- title: "Getting Started with {epiparameter}" output: bookdown::html_vignette2: code_folding: show vignette: > %\VignetteIndexEntry{Getting Started with {epiparameter}} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5 ) ``` It is often the case that in an infectious disease outbreak epidemiological parameters are required in order to characterise and model the dynamics of disease transmission and evaluate control strategies. In those scenarios, epidemiological parameters are commonly retrieved from the literature, and there is currently no library of parameters in order to contrast and compare different reported parameters for a range of infectious diseases and pathogens, from different published studies over time, of which some may be meta-analyses. The {epiparameter} R package is a library of epidemiological parameters, with classes to handle this data and a set of functions to manipulate and use epidemiological parameters and distributions. The package also contains functionality for converting and extracting distribution parameters from summary statistics. ::: {.alert .alert-primary} ### Use case An outbreak of a known or potentially novel pathogen is detected and key parameters such as delay distributions (e.g. incubation period or serial interval) are required to interpret early data. {epiparameter} can provide these distributions from a selection of published sources, such as past analysis of the same or similar pathogen, in order to provide relevant epidemiological parameters for new analysis. ::: This vignette will provide a introduction to the data stored within {epiparameter}, how to read it into R, manipulate the data, and the functions (and methods) implemented in the package to facilitate easy application of parameters into epidemiological pipelines. ```{r setup} library(epiparameter) ``` ### Library of epidemiological parameters First, we will introduce the library, or database, of epidemiological parameters available from {epiparameter}. The library is stored internally and can be read into R using the `epiparameter_db()` function. By default all entries in the library are returned. ```{r read-in-library} db <- epiparameter_db() db ``` The output is a list of `` objects, where each element in the list corresponds to an entry in the parameter database. To see a full list of the diseases and distributions stored in the library use the `parameter_tbl()` function. Here we show the first six rows of the output. ```{r, parameter-tbl} parameter_tbl(multi_epiparameter = db) ``` `parameter_tbl()` can also subset the database supplied to the function. ```{r, parameter-tbl-subset} parameter_tbl(multi_epiparameter = db, disease = "Ebola") ``` More details on the data collation and the library of parameters can be found in the [Data Collation and Synthesis Protocol vignette](https://epiverse-trace.github.io/epiparameter/articles/data_protocol.html). ### Single set of epidemiological parameters {epiparameter} introduces a new class for working with epidemiological parameters in R: ``, contains the name of the disease, the name of the epidemiological distribution, parameters (if available) and citation information of parameter source, as well as other information. This is the core data structure in the {epiparameter} package and holds a single set of epidemiological parameters. An `` object can be: 1. Pulled from database (`epiparameter_db()`) ```{r epiparameter-db} # from database # fetch for COVID-19 incubation period from database # return only a single covid_incubation <- epiparameter_db( disease = "COVID-19", epi_dist = "incubation period", single_epiparameter = TRUE ) covid_incubation ``` 2. Created manually (using the class constructor function: `epiparameter()`) ```{r epiparameter-constructor} # using constructor function covid_incubation <- epiparameter( disease = "COVID-19", pathogen = "SARS-CoV-2", epi_dist = "incubation period", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 2, scale = 1) ), summary_stats = create_summary_stats(mean = 2), citation = create_citation( author = person( given = list("John", "Amy"), family = list("Smith", "Jones") ), year = 2022, title = "COVID Incubation Period", journal = "Epi Journal", doi = "10.27861182.x" ) ) covid_incubation ``` Not all arguments are specified in the example using the class constructor (`epiparameter()`) above, for example the `metadata` or parameter uncertainty (`uncertainty`) is not provided. See the help documentation for the `epiparameter()` function using `?epiparameter` to see each argument. Also see documentation for `` helper functions, e.g., `?create_citation()`. Manually creating `` objects can be especially useful if new parameter estimates become available but are not yet incorporated into the {epiparameter} library. As seen in the examples in this vignette, the `` class has a custom printing method which shows the disease, pathogen (if known), the epidemiological distribution, a citation of the study the parameters are from and the probability distribution and parameter of that distribution (if available). ::: {.alert .alert-success} ### Benefit of `` By providing a consistent and robust object to store epidemiological parameters, `` objects can be applied in epidemiological pipelines, for example [{episoap}](https://github.com/epiverse-trace/episoap). The data contained within the object (e.g. parameter values, pathogen type, etc.) can be modified but the pipeline will continue to operate because the class is unchanged. ::: The probability distribution (`prob_distribution`) argument requires the distribution specified in the standard R naming. In some cases these are the same as the distribution's name, e.g., `gamma` and `weibull`. Examples of where the distribution name and R name differ are lognormal and `lnorm`, negative binomial and `nbinom`, geometric and `geom`, and poisson and `pois`. ## Subsetting database The database can be subset directly by `epiparameter_db()`. Here the results can be subset by author. It is recommended to use the family name of the first author instead of the full name. Only the first author will be matched when the entry is from a source with multiple authors. ```{r epiparameter-db-subset-author} epiparameter_db( disease = "COVID-19", epi_dist = "incubation period", author = "Linton" ) ``` The results can be further subset using the `subset` argument, for example `subset = sample_size > 100` will return entries with a sample size greater than 100. See `?epiparameter_db()` for details on how to use this argument to subset which database entries get returned. If a single `` is required then the `single_epiparameter` argument can be set to `TRUE` and this will return a single set of epidemiological parameters (i.e. one delay distribution), if available. If multiple entries in the parameter library match the search criteria (e.g. disease type) then the entries that are parameterised (i.e. distribution parameters are known), account for right truncation when inferred, and were estimated from the largest sample size are preferentially selected (in that order). ```{r epiparameter-db-single-epiparameter} epiparameter_db(disease = "SARS", single_epiparameter = TRUE) ``` ## Distribution functions `` objects store distributions, and mathematical functions of these distribution can easily be extracted directly from them. It is often useful to access the probability density function, cumulative distribution function, quantiles of the distribution, or generate random numbers from the distribution in the `` object. The distribution functions in {epiparameter} allow users to easily use these. ```{r epiparameter-dist-methods} ebola_incubation <- epiparameter_db( disease = "Ebola", epi_dist = "incubation period", single_epiparameter = TRUE ) density(ebola_incubation, at = 0.5) cdf(ebola_incubation, q = 0.5) quantile(ebola_incubation, p = 0.5) generate(ebola_incubation, times = 10) ``` ## Plotting epidemiological distributions `` objects can easily be plotted to see the PDF and CDF of distribution. ```{r plot-epiparameter} plot(ebola_incubation) ``` The default plotting range for time since infection is from zero to the 99th quantile of the distribution. This can be altered by specifying the `xlim` argument when plotting an `` object. ```{r plot-epiparameter-dayrange} plot(ebola_incubation, xlim = c(1, 25)) ``` This plotting function can be useful for visually comparing epidemiological distributions from different publications on the same disease. In addition, plotting the distribution after manually creating an `` help to check that the parameters are sensible and produce the expected distribution. ### Accessors The `` class also has accessor functions that can help access elements from the object in a standardised format. ```{r epiparameter-accessors} get_parameters(ebola_incubation) get_citation(ebola_incubation) ``` ## Parameter conversion and extraction ### Conversion Parameters are often reported in the literature as mean and standard deviation (or variance). These summary statistics can often be (analytically) converted to the parameters of the distribution using the conversion function in the package (`convert_summary_stats_to_params()`). We also provide conversion functions in the opposite direction, parameters to summary statistics (`convert_params_to_summary_stats()`). ### Extraction The functions `extract_param()` handles all the extraction of parameter estimates from summary statistics. The two extractions currently supported in {epiparameter} are from percentiles and from median and range. ## Adding library entries and contributing to {epiparameter} If a set of epidemiological parameter has been inferred and is known to the user but has not yet been incorporated into the {epiparameter} database, these parameters can be manually added to the library. ```{r add-to-library} # wrap in list to append to database new_db <- append(db, covid_incubation) tail(new_db, n = 3) ``` Note that this only adds the parameters to the library in the environment, and does not save to the database file in the package. Hence, if you restart your R session, you will lose the changes. The library of epidemiological parameters is a living database, so as new studies are published we hope to incorporate these. Searching for and recording parameters in the database is extremely time-consuming, so we welcome contributions of new parameters by either making a pull request to the package or adding information to the contributing spreadsheet. These will be incorporated into the database by the package maintainers and your contributions will be acknowledged. See the [Data Collation and Synthesis Protocol](https://epiverse-trace.github.io/epiparameter/articles/data_protocol.html) vignette on information about contributing to the library of epidemiological parameters.