The current version of serofoi supports three different models for estimating the Force-of-Infection (FoI), including constant and time-varying trajectories. For fitting the model to the seroprevalence data we use a suit of bayesian models that include prior and upper prior distributions
The force of infection, also known as the hazard rate or the infection pressure, is a key concept in mathematical modelling of infectious diseases. It represents the rate at which susceptible individuals become infected, given their exposure to a pathogen. In simple terms, the force of infection quantifies the risk of a susceptible individual becoming infected over a period of time. It is usually expressed as a rate per unit of time (e.g., per day or per year).
The FoI is one of the most important parameters in epidemiology, but it is often incorrectly assumed to be constant over time. Identifying whether the FoI follows a constant or a time-varying trend can be important in the identification and characterization of the spread of disease. In Table 1 there is a summary of the models currently supported by serofoi.
Model Option | Description and usage |
---|---|
constant |
Constant FoI |
tv_normal |
Time-varying normal FoI: slow change in FoI |
tv_normal_log |
Time-varying normal-log FoI: fast epidemic change in FoI |
Table 1. Model options and descriptions.
The endemic constant model is a simple mathematical model used in epidemiology to describe the seroprevalence of an infectious disease within a population, as a product of a long-term transmission.
For a constant FoI endemic model, the rate of infection acquisition
λ is constant over time for
each trajectory, and the seroprevalence P behaves as a cumulative process
increasing monotonically with age. For the seroprevalence at age a and time t, we have: P(a, t) = 1 − exp (−λa)
The number of positive cases follows a binomial distribution, where
n is the number of trials
(size of the age group) and P
is the probability of successes (seroprevalence) for a certain age
group: p(a, t) ∼ binom(n(a, t), P(a, t))
In serofoi, for the constant model, the
FoI (λ) is modelled
within a Bayesian framework using a uniform prior distribution ∼ U(0, 2). Future versions of the
package may allow to choose different default distributions. This model
can be implemented for the previously prepared dataset
data_test
by means of the fit_seromodel
function specifying fit_seromodel="constant"
.
The object simdata_constant
contains a minimal simulated
dataset that emulates an hypothetical endemic situation where the
FoI is constant with value 0.2 and includes data for 250
samples of individuals between 2 and 47 years old with a number of
trials n = 5. The following
code shows how to implement the constant model to this simulated
serosurvey:
data("simdata_constant")
serodata_constant <- prepare_serodata(simdata_constant)
model_1 <- fit_seromodel(
serodata = serodata_constant,
foi_model = "constant",
iter = 800
)
plot_seromodel(
model_1,
serodata = serodata_constant,
size_text = 6
)
Figure 1. Constant serofoi model plot. Simulated (red) vs modelled (blue) FoI.
In this case, 800 iterations are enough to ensure convergence. The
plot_seromodel
method provides a visualisation of the
results, including a summary where the expected log pointwise predictive
density (elpd
) and its standard error (se
) are
shown. We say that a model converges if all the R-hat estimates are
below 1.1.
For the time-varying FoI models, the probability for a case to be positive at age a at time t also follows a binomial distribution, as described above. However, the seroprevalence is obtained from a cumulative of the yearly-varying values of the FoI over time: $$ P(a,t) = 1 - \exp\left(-\sum_{i=t-a+1}^{t}\lambda_i\right) $$ The corresponding serosurvey completed at time tsur is informative for the interval [tsur − amax, tsur].
The time-varying slow normal model relies on the following
prior distributions for the FoI to describe the spread of a
given infectious disease within a population over time: $$
\lambda(t)\sim normal(\lambda(t-1), \sigma) \\
\lambda(t=1) \sim normal(0, 1)
$$ The object simdata_sw_dec
contains a minimal
simulated dataset that emulates a situation where the FoI
follows a stepwise decreasing tendency (FoI panel in Fig. 2).
The simulated dataset contains information about 250 samples of
individuals between 2 and 47 years old with a number of trials n = 5. The following code shows how
to implement the slow time-varying normal model to this simulated
serosurvey:
data("simdata_sw_dec")
serodata_sw_dec <- prepare_serodata(simdata_sw_dec)
model_2 <- fit_seromodel(
serodata = serodata_sw_dec,
foi_model = "tv_normal",
iter = 1500
)
plot_seromodel(model_2,
serodata = serodata_sw_dec,
size_text = 6
)
Figure 2. Slow time-varying serofoi model plot. Simulated (red) vs modelled (blue) FoI.
The number of iterations required may depend on the number of years, reflected by the difference between the year of the serosurvey and the maximum age-class sampled.
The time-varying fast epidemic model, relies on normal prior distributions for the FoI in the logarithmic scale, i.e: $$ \lambda(t)\sim normal(\log(\lambda(t-1)), \sigma) \\ \lambda(t=1) \sim normal(-6, 4) $$ This is done in order to capture fast changes in the FoI trend. Importantly, the standard deviation parameter of this normal distribution of the FoI λ(t) is set using an upper prior that follows a Cauchy distribution.
In order to test this model we use the minimal simulated dataset
contained in the simdata_large_epi
object. This dataset
emulates a hypothetical situation where a three-year epidemic occurs
between 2032 and 2035. The simulated serosurvey tests 250 individuals
from 0 to 50 years of age in the year 2050. The implementation of the
fast epidemic model can be obtained running the following lines of
code:
data("simdata_large_epi")
serodata_large_epi <- prepare_serodata(simdata_large_epi)
model_3 <- fit_seromodel(
serodata = serodata_large_epi,
foi_model = "tv_normal_log",
iter = 1500
)
model_3_plot <- plot_seromodel(model_3,
serodata = serodata_large_epi,
size_text = 6
)
plot(model_3_plot)
Figure 3. Time-varying fast epidemic serofoi model plot. Simulated (red) vs modelled (blue) FoI.
In Fig 3 we can see that the fast epidemic serofoi model is
able to identify the large epidemic simulated on the
simdata_large_epi
dataset.
The statistical details of the three models are described in Table 2.
Model Option | Probability of positive case at age a | Prior distribution | Upper priors |
---|---|---|---|
constant |
∼ binom(n(a, t), P(a, t)) | λ ∼ uniform(0, 2) | |
tv_normal |
∼ binom(n(a, t), P(a, t)) | $\lambda\sim normal(\lambda(t-1),\sigma)\\ \lambda(t=1)\sim normal(0,1)$ | σ ∼ Cauchy(0, 1) |
tv_normal_log |
∼ binom(n(a, t), P(a, t)) | $\lambda\sim normal(log(\lambda(t-1)),\sigma)\\ \lambda(t=1)\sim normal(-6,4)$ | σ ∼ Cauchy(0, 1) |
Table 2. Statistical characteristics of serofoi’s currently supported models for the FoI (λ). Here n is the size of an age group a at time-step t and P is its corresponding seroprevalence.
Above we showed that the fast epidemic model
(tv_normal_log
) is able to identify the large epidemic
outbreak described by the simdata_large_epi
dataset, which
was simulated according to a step-wise decreasing FoI (red line
in Fig 3).
Now, we would like to know whether this model actually fits this
dataset better than the other available models in
serofoi. For this, we also implement both the
endemic model (constant
) and the slow time-varying normal
model (tv_normal
):
Using the function cowplot::plot_grid
we can visualise
the results of the three models simultaneously:
Figure 4. Model comparison between the three serofoi models for a large-epidemic simulated dataset.
A common criterion to decide what model fits the data the best is to
choose the one with the larger elpd
. According to this
criterion, in this case the best model is the fast epidemic model, which
is the only one that manages to identify the large epidemic (see the
second row of panel C in Figure 4).
NOTE: Running the serofoi models for the first time on your local computer may take a few minutes for the rstan code to compile locally. However, once the initial compilation is complete, there is no further need for local compilation.