The {epiparameter}
R package contains a library of epidemiological parameter data and
functions that read and handle this data. The delay distributions
describe the time between two events in epidemiology, for example
incubation period, serial interval and onset-to-death; while the
offspring distributions describe the number of secondary infections from
a primary infection in disease transmission. The library is compiled by
a process of collecting, reviewing and extracting data from
peer-reviewed literature1, including research articles, systematic
reviews and meta-analyses.
The {epiparameter}
package will act as a ‘living
systematic review’ (sensu Elliott et al.
(2014))
which will be actively updated and maintained to provide a reliable
source of data on epidemiological distributions. To prevent bias in the
collection or assessment of the data, a well-defined methodology of
searching and refining is required. This document aims to provide
transparency on the methodology used by the {epiparameter}
maintainers by outlining the steps taken at each stage of the data
handling. It can also serve as a guide to contributors wanting to search
and provide epidemiological parameters currently missing from the
library. This protocol should also facilitate reproducibility in the
searches, results and appraisal steps.
There is a large body of work on the methods to best conduct literature searches and data collection as part of systematic reviews and meta-analyses2, which we use as the basis for our protocol. These sources are:
{epiparameter}
As defined by the PRISMA guidelines, having a clearly stated
objective helps to refine the goal of the project.
{epiparameter}
’s objective is to provide information for a
collection of distributions for a range of infectious diseases that is
as accurate, unbiased and as comprehensive as possible. Such
distributions will enable outbreak analysts to easily access these
distributions for routine analysis. For example, delay distributions are
necessary for: calculating case fatality rates adjusting for delay to
outcome, quantifying implications of different screening measures and
quarantine periods, estimating reproduction numbers, and scenario
modelling using transmission dynamic models.
To contribute to the {epiparameter}
library of
epidemiological parameter information, added your data to this
google sheet. This will then be integrated into the
{epiparameter}
library by the package maintainers, and the
information will then be accessible to all {epiparameter}
package users.
The {epiparameter}
package spans a range of infectious
diseases, including several distributions for each disease when
available. The pathogens and diseases that are currently systematically
searched for and included in the package library are:
#> Returning 125 results that match the criteria (100 are parameterised).
#> Use subset to filter by entry variables or single_epiparameter to return a single entry.
#> To retrieve the citation for each use the 'get_citation' function
Disease | Pathogen |
---|---|
Adenovirus | Adenovirus |
Human Coronavirus | Human_Cov |
SARS | SARS-Cov-1 |
Influenza | Influenza-A |
Influenza | Influenza-A |
Influenza | Influenza-B |
Measles | Measles Virus |
Parainfluenza | Parainfluenza Virus |
RSV | RSV |
Rhinovirus | Rhinovirus |
Influenza | Influenza-A |
Influenza | Influenza-A |
RSV | RSV |
RSV | RSV |
Influenza | Influenza-A-H1N1 |
Influenza | Influenza-A-H1N1 |
Influenza | Influenza-A-H7N9 |
Influenza | Influenza-A-H7N9 |
Influenza | Influenza-A-H7N9 |
Influenza | Influenza-A-H7N9 |
Influenza | Influenza-A-H7N9 |
Influenza | Influenza-A-H1N1 |
Influenza | Influenza-A-H1N1Pdm |
Influenza | Influenza-A-H1N1Pdm |
Influenza | Influenza-A-H1N1 |
Influenza | Influenza-A-H1N1 |
Marburg Virus Disease | Marburg Virus |
Marburg Virus Disease | Marburg Virus |
Marburg Virus Disease | Marburg Virus |
Marburg Virus Disease | Marburg Virus |
Marburg Virus Disease | Marburg Virus |
SARS | SARS-Cov-1 |
SARS | SARS-Cov-1 |
Smallpox | Smallpox-Variola-Major |
Smallpox | Smallpox-Variola-Major |
Smallpox | Smallpox-Variola-Minor |
Smallpox | Smallpox-Variola-Minor |
Mpox | Monkeypox Virus |
Pneumonic Plague | Yersinia Pestis |
Hantavirus Pulmonary Syndrome | Hantavirus (Andes Virus) |
Ebola Virus Disease | Ebola Virus |
Dengue | Dengue Virus |
Dengue | Dengue Virus |
Dengue | Dengue Virus |
Zika Virus Disease | Zika Virus |
Chikungunya | Chikungunya Virus |
Dengue | Dengue Virus |
Dengue | Dengue Virus |
Japanese Encephalitis | Japanese Encephalitis Virus |
Rift Valley Fever | Rift Valley Fever Virus |
West Nile Fever | West Nile Virus |
West Nile Fever | West Nile Virus |
West Nile Fever | West Nile Virus |
Yellow Fever | Yellow Fever Viruses |
Yellow Fever | Yellow Fever Viruses |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Ebola Virus Disease | Ebola Virus-Zaire Subtype |
Ebola Virus Disease | Ebola Virus-Zaire Subtype |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
Ebola Virus Disease | Ebola Virus |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
MERS | MERS-Cov |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
COVID-19 | SARS-CoV-2 |
Mpox | Mpox Virus |
Mpox | Mpox Virus Clade I |
Mpox | Mpox Virus |
Mpox | Mpox Virus Clade I |
Mpox | Mpox Virus Clade IIa |
Mpox | Mpox Virus Clade IIb |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Mpox | Mpox Virus |
Chikungunya | Chikungunya Virus |
Chikungunya | Chikungunya Virus |
Chikungunya | Chikungunya Virus |
The distributions currently included in the literature search for each pathogen/disease are:
#> Returning 125 results that match the criteria (100 are parameterised).
#> Use subset to filter by entry variables or single_epiparameter to return a single entry.
#> To retrieve the citation for each use the 'get_citation' function
Epidemiological Parameter |
---|
incubation period |
serial interval |
generation time |
onset to death |
offspring distribution |
hospitalisation to death |
hospitalisation to discharge |
notification to death |
notification to discharge |
onset to discharge |
onset to hospitalisation |
onset to ventilation |
case fatality risk |
However, these simple search phrases can return a large number of irrelevant papers. Using a more specific search schema depending on the search engine used. For example, if using Google Scholar a schema like:
Or if Web of Science is being used:
This should refine the results to a more suitable set of literature.
Literature search engines: using a selection of search engines to prevent one source potentially omitting papers. Suggested search sites are: Google Scholar, Web of Science, PubMed, and Scopus.
Adding papers: in addition to the database
entries from papers that have been identified in a literature search,
entries can be supplemented by recommendations (i.e. from the community)
or through being cited by a paper in the literature search. Papers may
be recommended by experts in research or public health communities. We
plan to use two methods of community engagement. Firstly a open-access
Google sheet allows people to add their distribution data which will
then be reviewed by one of the {epiparameter}
maintainers
and incorporated if it meets quality checks. The second method - not yet
implemented - involves community members uploading their data to zenodo, which can then be read and loaded
into R using {epiparameter}
once checked.
Language restrictions: papers in English or
Spanish are currently supported in {epiparameter}
. Papers
written in another language but verified by an expert can also be
included in the database. However, these are not evaluated with the
review process described below and as a result are flagged to the user
when loaded in {epiparameter}
.
Removing duplicates: the library of parameters does not contain duplicates of studies, but multiple entries per study can be included if a paper reports multiple results (e.g. from the full data set and then a subset of the data). Studies that use the same data, subsets or supersets of data used in other papers in the library are included.
Abstract and methods screening: once a number of
unique sources have been identified, each should be reviewed for its
suitability by reviewing the abstract and searching for words or phrases
in the paper that indicate it reports the parameters or summary
statistics of a distribution, this can include searching the methods
section for words on types of distributions (e.g. lognormal), fitting
procedures (e.g. maximum likelihood or bayesian), or searching the
results for parameter estimates. The {epiparameter}
library
includes entries where parameters or summary statistics are reported but
a distribution is not specified, and entries where the distribution is
specified but the parameters are not reported.
A database of unsuitable papers will be kept to remind maintainers which papers have not been included and aids in the updating of the database (see below) by preventing redundant reviewing a previously rejected paper.
Stopping criteria: for many searches, the number of results is far larger than could be reasonably evaluated outside a full systematic review. After refining which papers contain the required information (abstract and methods screening), around 10 papers per pathogen are screened for each search (per search round, see updating section below for details). If the number of papers that pass the abstract and methods screening is fewer than 10, all suitable papers are reviewed.
Full paper screening: after the abstract and methods screening, those papers not excluded should be reviewed in full to verify they indeed contain the required information on distribution parameters and information on methodology used. It is acceptable to include a secondary source that contains information on the delay distribution when the primary source is unavailable or does not report the distribution. The inference of the delay distribution does not have to be a primary subject of the research article, for example if it was inferred to be used in estimation of R0 it can still be included in the database. Additionally, distribution parameters based on illustrative values for use in simulations - rather than inferred from data - are considered unsuitable and should be excluded.
Again, any papers excluded at this stage should be recorded in the database of unsuitable sources with reasoning to prevent having to reassess when updating the database.
Post hoc removal: if any
{epiparameter}
parameters are later identified as being
inappropriate then they can be removed from the database. In most cases
this is unlikely as limitations can be appended onto data entries to
make users aware of limitations (e.g. around assumptions used to infer
the distribution), only in the most extreme cases will data be
completely removed from the database.
Note: systematic reviews focusing on effect sizes can be subject to publication bias (e.g. more positive or significant results in the literature). However, distribution inference does not focus on significance testing or effect sizes, so this bias is not considered in the collection process.
Extracting parameters: for any underlying
distributions (e.g. gamma, lognormal), parameters (e.g. shape/scale,
meanlog/sdlog), and summary statistics (e.g. mean, standard deviation,
median, range or quantiles) given in the paper, values should be
recorded verbatim from the paper into the database. When these are read
into R using the {epiparameter}
package, other aspects of
the distribution are automatically calculated and available. For example
if the mean and standard deviation of a gamma distribution is reported
for a serial interval these values will be stored in the database. But
once in R, the shape and scale parameters of the gamma distribution will
be automatically reconstructed and the resulting distribution available
for use.
The {epiparameter}
library exactly reflects the
literature. By which we mean that information not present in the paper
should not be imputed from prior knowledge (e.g. vector of a disease is
known but not stated), or by performing calculating with reported
values. This prevents the issue of not having clear provenance for the
data in the library.
The requirements of each entry in the database is defined by the data
dictionary. Here we outline the minimal dataset that is required to be
included in the {epiparameter}
library is:
All other information for each database entry is non-essential.
See the data dictionary included in {epiparameter}
for
all database fields with a description of each and the range of possible
values each field can take.
{epiparameter}
The inference of parameters for a delay distribution often requires
methodological adjustments to correct for factors that would otherwise
bias the estimates. These includes accounting for interval-censoring of
the data when the timing of an event (e.g. exposure to a pathogen) is
not know with certainty, but rather within a time window. Or adjusting
for phase bias when the distribution is estimated during a growing or
shrinking stage of an epidemic. The aim of {epiparameter}
is not to make a judgement on which parameters are ‘better’ than others,
but to notify and warn the user of the potential limitations of the
data. The aspects assessed are: 1) whether the method includes single or
double interval-censoring when the exposure or onset times are not known
with certainty (i.e. on a single day); 2) does the method adjust for
phase bias when the outbreak is in an ascending or descending phase.
These are indicated by boolean values to indicate whether they are
reported in the paper and users are recommended to refer back to the
paper to determine whether estimates are biased.
{epiparameter}
review processFor a set of parameters to be included in the database it must pass
the abstract and methods screening and full screening and subsequently a
review by one of the {epiparameter}
maintainers. This
process involves running diagnostic checks and cross-referencing the
reported parameters with those in the paper to ensure they match exactly
and that the results plot of the PDF/CDF/PMF matches anything plotted in
the paper, if available. This prevents a possible misinterpretation
(e.g. serial interval for incubation period). This check also includes
making sure the unique identifiers for the paper match the author’s
name, publication year and other data recorded in the database.
Because search and review stages are time consuming and cannot be
continuously carried out, we aim to keep the {epiparameter}
library up-to-date as a living data library by conducting regular
searches (i.e. every 3-4 months) to fill in any missing papers or new
publication since the last search. The epidemiological literature can
expand rapidly, especially during a new outbreak. Therefore we can
optionally include new studies that are of use to the epidemiological
community in between regular updates. These small additions will still
be subject to the data quality assessment and diagnostics to ensure
accuracy, and will likely be picked up in the subsequent literature
searches. It is likely that for existing pathogens that have not had a
major increase in incidence since the last update few new papers will be
reporting delay distributions. In these cases papers that were not
previously reviewed due to limited reviewing time for each round of
updates are now checked.
We particularly value community contributions to the database, so everyone can benefit from analysis that has already been conducted, and duplicated of effort is reduced.
All papers that are returned in the search results but are not suitable, either at the stage of abstract screening, or after reviewing the entirety of the paper, are recorded in a database by the following information:
We also include distribution parameters that are published on pre-print servers such as arxiv, bioRxiv and medRxiv. But will update the citation reference once the paper is accepted and published by a journal.↩︎
{epiparameter}
is itself not a
meta-analytic tool, but the references stored within may be useful as
part of a meta-analysis.↩︎
{epiparameter}
{epiparameter}
{epiparameter}
review process