This vignette shows how to prepare
<incidence2>
objects from the incidence2
package for use with cfr, using the
prepare_data()
method for the
<incidence2>
class. If detailed individual-level data
are available that include deaths and recoveries, then alternative
methods for severity estimation could be used (e.g. directly calculating
CFR from the subset cases with a known death outcome). However, there
may be situations where only deaths are recorded, in which case the
methods described here would provide an option for CFR calculation.
We first load the libraries we require, including cfr, incidence2, outbreaks for linelist data from a simulated ebola outbreak.
Aggregated case data such as the Covid-19 dataset provided by
incidence2 can be converted into an
<incidence2>
object using
incidence2::incidence()
, and then handled by
prepare_data()
.
# get data bundled with the {incidence2} package
covid_uk <- covidregionaldataUK
# view the data
head(covid_uk)
#> date region region_code cases_new cases_total deaths_new
#> 1 2020-01-30 East Midlands E12000004 NA NA NA
#> 2 2020-01-30 East of England E12000006 NA NA NA
#> 3 2020-01-30 England E92000001 2 2 NA
#> 4 2020-01-30 London E12000007 NA NA NA
#> 5 2020-01-30 North East E12000001 NA NA NA
#> 6 2020-01-30 North West E12000002 NA NA NA
#> deaths_total recovered_new recovered_total hosp_new hosp_total tested_new
#> 1 NA NA NA NA NA NA
#> 2 NA NA NA NA NA NA
#> 3 NA NA NA NA NA NA
#> 4 NA NA NA NA NA NA
#> 5 NA NA NA NA NA NA
#> 6 NA NA NA NA NA NA
#> tested_total
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NA
Note that the grouping structure of this dataset
given by the “region” variable is present in the
<incidence2>
object. prepare_data()
respects grouping structure when present, and returns a dataset with one
additional column for each grouping variable.
# convert to incidence2 object
covid_uk_incidence <- incidence(
covid_uk,
date_index = "date",
groups = "region",
counts = c("cases_new", "deaths_new"),
count_names_to = "count_variable"
)
#> Warning in incidence(): `cases_new` contains NA values. Consider imputing these
#> and calling `incidence()` again.
# View head of prepared data with NAs retained
# Note that this will cause issues with CFR functions such as cfr_static()
head(
prepare_data(
covid_uk_incidence,
cases_variable = "cases_new",
deaths_variable = "deaths_new"
)
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#> date region cases deaths
#> 1 2020-01-30 East Midlands 0 0
#> 2 2020-01-30 East of England 0 0
#> 3 2020-01-30 England 2 0
#> 4 2020-01-30 London 0 0
#> 5 2020-01-30 North East 0 0
#> 6 2020-01-30 North West 0 0
In this example, the “region” column is added to the data, allowing for disease severity to be calculated separately for each region if needed.
Users who wish to override grouping variables in their data are
advised to do this when converting their data into an
<incidence2>
object, and to be aware of how
incidence2 aggregates case and death counts, including how it
deals with NA
s; see incidence2::incidence()
for more details.
Users who prepare data while maintaining grouping structure should
take care to apply cfr_*()
to their data by group, as
cfr_*()
functions cannot currently handle grouped
data.