This vignette outlines the design decisions that have been taken during the development of the {superspreading} R package, and provides some of the reasoning, and possible pros and cons of each decision.
This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.
The {superspreading} package aims to provide a range of summary
metrics to characterise individual-level variation in disease
transmission and its impact on the growth or decline of an epidemic.
These include calculating the probability an outbreak becomes an
epidemic (probability_epidemic()
), or conversely goes
extinct (probability_extinct()
), the probability an
outbreak can be contained (probability_contain()
), the
proportion of cases in cluster of a given size
(proportion_cluster_size()
), and the proportion of cases
that cause a proportion of transmission
(proportion_transmission()
).
The other aspect of the package is to provide probability density
functions and cumulative distribution functions to compute the
likelihood for distribution models to estimate heterogeneity in
individual-level disease transmission that are not available in R
(i.e. base R). At present we include two models: Poisson-lognormal
(dpoislnorm()
& ppoislnorm()
) and
Poisson-Weibull (dpoisweibull()
&
ppoisweibull()
) distributions.
The package implements a branching process simulation based on bpmodels::chain_sim()
to enable the numerical calculation of the probability of containment
within a outbreak time and outbreak duration threshold. In the future
this function could be removed in favour of using a package implementing
branching process models as a dependency. The package is mostly focused
on analytical functions that are derived from branching process models.
The package provides functions to calculate variation in
individual-level transmission but does not provide functions for
inference, and currently relies on {fitdistrplus} for fitting
models.
Functions with the name probability_*()
return a single
numeric
. Functions with the name
proportion_*()
return a <data.frame>
with as many rows as combinations of input values (see
expand.grid()
). The consistency of simple well-known data
structure makes it easy for users to apply these functions in different
scenarios.
The distribution functions return a vector of numeric
s
of equal length to the input vector. This is the same behaviour as the
base R distribution functions.
proportion_*()
functions return a
<data.frame>
with the proportion column(s) containing
character
strings, formatted with a percentage sign
(%
) by default. It was reasoned that {superspreading} is
most likely used either as a stand-alone package, or at the terminus of
a epidemiological analysis pipeline, and thus the outputs of
{superspreading} functions would not be passed into other functions. For
instances where these proportions need to be passed to another
calculation or for plotting purposes the format_prop
argument can be switched to FALSE
and a
numeric
column of proportions will be returned.
The distribution functions are vectorised (i.e. wrapped in
Vectorize()
). This enables them to be used identically to
base R distribution functions.
Native interoperability with <epiparameter>
objects, from the {epiparameter} package is enabled for
probability_*()
and proportion_*()
via the
offspring_dist
argument. This allows user to pass in a
single object and the parameters required by the {superspreading}
function will be extracted, if these are not available within the
<epiparameter>
object the function returns an
informative error. The offspring_dist
argument is after
...
to ensure users specify the argument in full and not
accidentally provide data to this argument.
Internal functions have a dot (.
) prefix, exported
functions do not.
Several functions use constants that are internally defined
(e.g. NSIM
and FINITE_INF
). These are used in
several functions to prevent the use of apparently arbitrary magic
numbers. Constants are all uppercase to make clear they are
internal constants (following MDN
and PEP8
styles. These constants should not be exported (i.e. should not appear
in the NAMESPACE
) as they should only be used by functions
and not package users.
The aim is to restrict the number of dependencies to a minimal required set for ease of maintenance. The current hard dependencies are:
{stats} is distributed with the R language so is viewed as a lightweight dependency, that should already be installed on a user’s machine if they have R. {checkmate} is an input checking package widely used across Epiverse-TRACE packages.
Suggested dependencies (not including package documentation ({knitr}, {rmarkdown}), testing ({spelling} and {testthat}), and plotting ({ggplot2})) are: {epiparameter}, used to easily access epidemiological parameters from the package’s library, and {fitdistrplus}, used for model fitting methods.
There are no special requirements to contributing to {superspreading}, please follow the package contributing guide.