Understanding disease severity, and especially the case fatality risk (CFR), is key to outbreak response. During an outbreak there is often a delay between cases being reported, and the outcomes (for CFR, deaths) of those cases being known. Simply dividing total deaths to date by total cases to date may lead to an underestimate of the CFR rate in real-time, because many cases have outcomes that are not yet known.
Knowing the distribution of these delays from previous outbreaks of the same (or similar) diseases, and accounting for them, can therefore help ensure less biased estimates of disease severity. See the Concept section at the end of this vignette for more on how reporting delays bias CFR estimates.
The severity of a disease can be estimated while correcting for delays in reporting using methods outlines in Nishiura et al. (2009), and which are implemented in the cfr package.
A disease outbreak is underway. We want to know how severe the disease is in terms of the case fatality risk (CFR), but there is a delay between cases being reported, and the outcomes of those cases — whether recovery or death — being known. This is the reporting delay, and can be accounted for by knowing the reporting delay from past outbreaks.
First we load the cfr package.
Data on cases and deaths may be obtained from a number of publicly accessible sources, such as the global Covid-19 dataset curated by Our World in Data, a similar dataset made available through the R package covidregionaldata (Palmer et al. 2021), or data on outbreaks of other infections made available in outbreaks.
In an outbreak response scenario, such data may also be compiled and shared locally. See the vignette on working with data from incidence2 on working with a common format of incidence data which can help interoperability with other formats.
The cfr package requires only a data frame with three columns, “date”, “cases”, and “deaths”, giving the daily number of reported cases and deaths.
Here, we use some data from the first Ebola outbreak, in the Democratic Republic of the Congo in 1976, that is included with this package (Camacho et al. 2014).
We obtain the disease’s onset-to-death distribution from a more recent Ebola outbreak, reported in Barry et al. (2018). The onset-to-death distribution is considered to be Gamma distributed, with a shape k = 2.40 and a scale of θ = 3.33.
Note that while we use a continuous distribution here, it is more appropriate to use a discrete distribution instead as we are working with daily data.
Note also that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.
The forthcoming epiparameter package aims to be a library of epidemiological delay distributions, which can be accessed easily from within workflows. See the vignette on using delay distributions for more information on how to use this and other distribution objects supported by R to prepare delay density functions.
We use the function cfr_static()
to calculate overall
disease severity at the latest date of the outbreak.
cfr_static(
data = ebola1976,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#> severity_estimate severity_low severity_high
#> 1 0.9592 0.9295 0.9793
The cfr_static()
function is well suited to small
outbreaks where there are relatively few events and the time period
under consideration if relatively brief, so the severity is unlikely to
have changed over time.
To understand how severity has changed over time (e.g. following
vaccination or pathogen evolution), use the function
cfr_time_varying()
. This function is however not well
suited to small outbreaks because it requires sufficiently many cases
over time to estimate how CFR changes. More on this can be found on the
vignette on estimating how
disease severity varies over the course of an outbreak.
It is important to know what proportion of cases in an outbreak are being ascertained to muster the appropriate response, and to estimate the overall burden of the outbreak.
Note that the ascertainment ratio may be affected by a number of factors. When the main factor in low ascertainment is the lack of (access to) testing capacity, we refer to this as reporting or under-reporting.
The estimate_ascertainment()
function estimates the
ascertainment ratio using daily case and death data, the known severity
of the disease from previous outbreaks, and optionally a delay
distribution of onset-to-death.
Here, we estimate reporting in the 1976 Ebola outbreak in the Congo, assuming that Ebola virus disease (at that time) had a baseline severity of about 0.7 (70% of cases result in deaths), based on CFR values estimated in later, larger datasets. We use the onset-to-death distribution from Barry et al. (2018).
# estimate reporting with a baseline severity of 70%
estimate_ascertainment(
data = ebola1976,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33),
severity_baseline = 0.7
)
#> ascertainment_estimate ascertainment_low ascertainment_high
#> 1 0.7297748 0.7147963 0.7530931
This analysis suggests that between 70% and 83% of cases were reported in this outbreak.
More details can be found in the vignette on estimating the proportion of cases that are reported during an outbreak.
Simply dividing the number of deaths by the number of cases would obtain a CFR that is a naive estimator of the true CFR.
Suppose 10 people start showing symptoms of a disease on a given day and the end of that day all remain alive. Suppose that for the next 5 days, the numbers of new cases continue to rise until they reach 100 new cases on day 5. However, suppose that by day 5, all infected individuals remain alive.
The naive estimate of the CFR calculated at the end of the first 5 days would be zero, because there would have been zero deaths in total — at that point. That is to say, the outcomes of cases (deaths) would not be known.
Even after deaths begin to occur, this lag between the ascertainment of a case or hospitalisation and outcome leads to a consistently biased estimate. Hence, adjusting for such delays using an appropriate delay distribution is essential for accurate estimates of severity.