Understanding disease severity,
and especially the case fatality risk (CFR), is key to outbreak
response. During an outbreak there is often a delay between cases being
reported, and the outcomes (for CFR, deaths) of those cases being known,
and accounting for these leads to better estimates of CFR.
cfr_static()
can be used to calculate a static estimate of
the severity of an outbreak using methods from Nishiura et al. (2009) while accounting for the
distribution of reporting delays.
New to calculating disease severity using cfr? You might want to see the “Get started” vignette first.
We want a static estimate of the severity of an outbreak in the form of the case fatality risk (CFR) while correcting for the delay in reporting the outcomes of cases.
First load cfr and packages to access and plot data.
This example data comes from the 1976 Ebola virus disease (EVD, or Ebola) outbreak in the Democratic Republic of the Congo (Camacho et al. 2014).
We focus on the roughly the first half of this dataset, by subsetting the data so that we only include days before 30th September, 1976.
data("ebola1976")
# view the first few rows
head(ebola1976)
#> date cases deaths
#> 1 1976-08-25 1 0
#> 2 1976-08-26 0 0
#> 3 1976-08-27 0 0
#> 4 1976-08-28 0 0
#> 5 1976-08-29 0 0
#> 6 1976-08-30 0 0
df_ebola_subset <- filter(ebola1976, date <= "1976-09-30")
We retrieve the parameters of the distribution of durations (also called delays) between the onset of EVD symptoms and death from the literature (Barry et al. 2018). This is a Gamma distribution with shape k = 2.40 and scale θ = 3.33.
Note that while we shall use a continuous distribution here, it is more appropriate to use a discrete distribution instead as we are working with daily data. See the vignette on delay distributions for more on when using a continuous instead of discrete distribution is acceptable, and on using discrete distributions with cfr.
Note also that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.
The function estimate_outcomes()
estimates the number of
cases whose outcomes are expected to be known by each day i of an outbreak, given a
time-series of case onsets and the distribution of delays between
symptom onset and case outcome. In the context of CFR estimation, the
delay distribution is usually an ‘onset-to-death’ distribution.
The resulting data frame contains two new columns, “estimated_outcomes”, for the number of cases whose outcomes are expected to be known on each day, and “u_t” for the ratio of cumulative number of cases estimated to have known outcomes and the cumulative number of cases reported until each date specified in data.
# calculate known death outcomes
df_estimated_outcomes_ebola <- estimate_outcomes(
data = df_ebola_subset,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
# print head of data frame
head(df_estimated_outcomes_ebola)
#> date cases deaths estimated_outcomes u_t
#> 1 1976-08-25 1 0 0.00000000 0.00000000
#> 2 1976-08-26 0 0 0.03323030 0.03323030
#> 3 1976-08-27 0 0 0.06494676 0.09817706
#> 4 1976-08-28 0 0 0.08485286 0.18302992
#> 5 1976-08-29 0 0 0.09400738 0.27703731
#> 6 1976-08-30 0 0 0.09515185 0.37218916
# print tail of data frame
tail(df_estimated_outcomes_ebola)
#> date cases deaths estimated_outcomes u_t
#> 32 1976-09-25 11 8 7.787856 0.5283224
#> 33 1976-09-26 7 11 8.301252 0.5563894
#> 34 1976-09-27 7 7 8.625233 0.5840532
#> 35 1976-09-28 4 13 8.761617 0.6207698
#> 36 1976-09-29 4 12 8.659704 0.6552761
#> 37 1976-09-30 3 9 8.371787 0.6904737
The estimated outcomes are lower than the number of cases at the
beginning of an outbreak as the case outcomes are only likely to become
known some days later; this also means that u_t
is likely
to be lower than 1.0 early in the outbreak. The u_t
for an
outbreak that has ended will be much closer to 1.0, as the outcomes of
all reported cases are expected to be known (depending on the quality of
outbreak monitoring). This depends on the distribution of delays between
onset and outcome, and the ratio u_t
will ‘catch up’ to 1.0
faster when the onset-to-outcome delay is short.
Note that the period between onset and death may be shorter than the period between onset and full recovery, and should not be considered equivalent. For CFR estimation, we are primarily interested in the former as the goal is estimating severity in the form of a fatality risk (or ratio for past outbreaks).
Note that estimate_outcomes()
is
exported but is primarily intended for internal use.
The function cfr_static()
wraps the internal function
estimate_outcomes()
to provide a static CFR by
automatically correcting for reporting delays if a delay density
function is provided.
# calculating the naive CFR
cfr_static(
data = df_ebola_subset
)
#> severity_estimate severity_low severity_high
#> 1 0.7197802 0.6485503 0.7836984
# calculating the corrected CFR
cfr_static(
df_ebola_subset,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#> severity_estimate severity_low severity_high
#> 1 NA NA NA
cfr_static()
automatically chooses a method of
calculating the severity estimate, based on whether delay correction is
applied, and further when delay correction is applied, depending on the
total number of cases and an initial rough estimate of the severity.
Naive estimate: When delay correction is not
applied, the CFR is the ratio of total deaths to total cases, and the
confidence interval is given by a Binomial test using
stats::binom.test()
where total deaths are successes, total
cases are trials, and the hypothesised success is 1.0
Delay correction, small outbreaks: For outbreaks
where the total cases are below the user-specified ‘Poisson threshold’
(poisson_threshold
, default = 100), the CFR and uncertainty
around it is taken from a profile likelihood generated from a Binomial
model of deaths (successes) and estimated known outcomes
(trials).
Delay correction, large outbreaks with low severity: For outbreaks with total cases greater than the Poisson threshold (default = 100) and with initial severity estimates < 0.05, the CFR and uncertainty are taken from a Poisson approximation of the Binomial profile likelihood (taking λ = np for n estimated outcomes and p as the severity estimate).
Delay correction, large outbreaks with higher severity: For outbreaks with total cases greater than the Poisson threshold (default = 100) and with initial severity estimates ≥ 0.05, the CFR and uncertainty are taken from a Normal approximation of the Binomial profile likelihood.
This example shows static severity estimation using cfr and data from the Covid-19 pandemic in the United Kingdom.
We load example Covid-19 daily case and death data provided with the
cfr package as covid_data
, and subset for the
first year of U.K. data.
# get Covid data loaded with the package
data("covid_data")
# filter for the U.K
df_covid_uk <- filter(
covid_data,
country == "United Kingdom", date <= "2020-12-31"
)
# View the first few rows and recall necessary columns: date, cases, deaths
head(df_covid_uk)
#> date country cases deaths
#> 1 2020-01-03 United Kingdom 0 0
#> 2 2020-01-04 United Kingdom 0 0
#> 3 2020-01-05 United Kingdom 0 0
#> 4 2020-01-06 United Kingdom 0 0
#> 5 2020-01-07 United Kingdom 0 0
#> 6 2020-01-08 United Kingdom 0 0
We retrieve the appropriate distribution for Covid-19 from Linton et al. (2020); this is a lognormal distribution with μ = 2.577 and σ = 0.440.
Note that Linton et al. (2020) fitted a discrete lognormal distribution and we use a continuous distribution, and that we are ignoring uncertainty in the distribution parameters and hence likely under-estimating uncertainty in the CFR.
Finally, we calculate the naive and corrected CFRs for the Covid-19 pandemic in the U.K.
# calculating the naive CFR
cfr_static(
df_covid_uk
)
#> severity_estimate severity_low severity_high
#> 1 0.03640132 0.03617238 0.0366313
# calculating the corrected CFR
cfr_static(
df_covid_uk,
delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)
#> severity_estimate severity_low severity_high
#> 1 0.0465 0.0462 0.0467
cfr_static()
follows Nishiura et
al. (2009) to calculate a
quantity ut for each day
within the input data which represents the proportion of cases with a
known adverse outcome (usually death) on day t.
$$ u_t = \dfrac{\sum_{i = 0}^t \sum_{j = 0}^\infty c_{i - j} f_{j}}{\sum_{i = 0} c_i}, $$
where ft is the value of the probability mass function at time t, and ct, dt are the number of new cases and new deaths at time t (respectively). We then use ut in the following likelihood function to estimate severity.
$$ {\sf L}(\theta | C_{t},D_{t},u_{t}) = \log{\dbinom{u_{t}C_{t}}{D_{t}}} + D_{t} \log{\theta} + (u_{t}C_{t} - D_{t})\log{(1 - \theta)}, $$
Ct and Dt are the cumulative number of cases and deaths (respectively) until time t.
Lastly θ (severity) is estimated θ using simple maximum-likelihood methods, allowing the functions within this package to be quick and easy tools to use.
The precise severity measure — CFR, IFR, HFR, etc — that θ represents depends upon the input data given by the user.
Case fatality risk (CFR) requires case and death incidence data, with a case-to-death delay distribution (or close approximation, such as symptom onset-to-death).
Infection fatality risk (IFR) requires infection and death incidence data, with an exposure-to-death delay distribution (or close approximation).
Hospitalisation Fatality Risk (HFR) requires hospitalisation and death incidence data, and the appropriate delay distribution (or close approximation).