--- title: "Outbreaks in heterogeneous networks" output: rmarkdown::html_vignette bibliography: references.json link-citations: true vignette: > %\VignetteIndexEntry{Outbreaks in heterogeneous networks} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(superspreading) library(ggplot2) library(scales) ``` Determining if an outbreak will grow and spread through a susceptible population is quantified by the basic reproduction number ($R_0$). When there is individual-level variability in the connectedness of different individuals in a network (i.e. higher variance in the degree of each node) it can lead to heterogeneity in transmission dynamics. Such scenarios are common in persistent sexually transmitted infections (STIs), where partners can change during the infectious period, which means highly connected individuals can be both more likely to acquire and pass on the infection. Under the basic assumption of homogeneous contact patterns (i.e. no network effects), we have the following expression for the basic reproduction number: $$ R_0 = \frac{\beta M}{\gamma} $$ where $\beta$ is the probability of transmission per contact, $1/\gamma$ is the duration of infectiousness ($\gamma$ is the rate of loss of infectiousness) and $M$ is the mean number of contacts (or partners) per unit time (e.g., per year). In contrast, @mayTransmissionDynamicsHuman1988 showed that the transmissibility of an infectious disease in a heterogeneous network can be defined as follows: $$ R_0 = \frac{\beta}{\gamma} \frac{M^2 + V}{M} $$ where $V$ is the variance of the number of contacts per unit time. This formulation can be appropriate if heterogeneity is predictable over time (i.e. highly connected individuals typically remain highly connected), the duration of infectiousness is similar or longer to the frequency of partner change among highly connected individuals, and the disease has the potential to cause a substantial outbreak (i.e. larger value of $\beta$ and/or $1/\gamma$). The {superspreading} package provides the `calc_network_R()` function to calculate the reproduction number using the unadjusted formula (first equation) and the adjusted formula (second equation). For example, the possibility of a sexually transmitted Zika virus outbreak was a particular concern when the infection spread globally in 2015-16. @yakobLowRiskSexuallytransmitted2016 used the above heterogeneous network model to determine the risk that sexual transmission could be a common mode of transmission for Zika virus, leading to sustained human-to-human transmission in the absence of vectors. Here we replicate the analysis of @yakobLowRiskSexuallytransmitted2016 to show the low likelihood of Zika causing an outbreak from sexual transmission. Following @yakobLowRiskSexuallytransmitted2016, we use data from the National Survey of Sexual Attitude and Lifestyles (Natsal) on the mean and variance in the number of sexual partners and age range [@mercerChangesSexualAttitudes2013]. ```{r, calc-R-zika} infect_duration <- exp(seq(log(0.01), log(10), length.out = 100)) prob_transmission <- exp(seq(log(0.01), log(1), length.out = 100)) params <- expand.grid( infect_duration = infect_duration, prob_transmission = prob_transmission ) R <- t(apply( params, MARGIN = 1, function(x) { calc_network_R( mean_num_contact = 14.1, sd_num_contact = 69.6, infect_duration = x[["infect_duration"]], prob_transmission = x[["prob_transmission"]], age_range = c(16, 74) ) } )) res <- cbind(params, R) res <- reshape( data = res, varying = c("R", "R_net"), v.names = "R", direction = "long", times = c("R", "R_net"), timevar = "group" ) ``` ```{r, plot-zika-r, fig.cap="The reproduction number using the unadjusted and adjusted calculation -- calculated using `calc_network_R()` -- with mean duration of infection on the x-axis and transmission probability per sexual partner on the y-axis. The line shows the points that $R_0$ is equal to one. Both axes are plotted on a natural log scale. This plot is similar to Figure 1 from @yakobLowRiskSexuallytransmitted2016, but is plotted as a heatmap and without annotation.", fig.width = 8, fig.height = 5} ggplot(data = res) + geom_tile( mapping = aes( x = infect_duration, y = prob_transmission, fill = R ) ) + geom_contour( mapping = aes( x = infect_duration, y = prob_transmission, z = R ), breaks = 1, # Set the contour line at R = 1 colour = "black" ) + scale_x_continuous( name = "Mean duration of infectiousness", trans = "log", breaks = breaks_log() ) + scale_y_continuous( name = "Zika virus transmission probability per sexual partners", trans = "log", breaks = breaks_log() ) + facet_wrap( vars(group), labeller = as_labeller(c(R = "R", R_net = "Adjusted R")) ) + scale_fill_viridis_c( trans = "log", breaks = breaks_log(n = 8)(min(res$R):max(res$R)), labels = label_comma() ) + theme_bw() ``` Another study that showed the network effects on transmission of an STI was @endoHeavytailedSexualContact2022, who estimated that Mpox (or monkeypox) could spread throughout a network on men who have sex with men (MSM), but would have lower transmissibility in the wider population. Using the Natsal UK data [@mercerChangesSexualAttitudes2013] on same- and opposite-sex sexual partnerships for the age range of 18 to 44, they show that the disease transmission for MSM is greater than for a non-MSM transmission network. Because Mpox has a relatively short infectious period, this study assumed that contacts remained fixed during the period of infection, and hence used a next generation matrix approach more akin to the group-specific transmission defined in the [finalsize package](https://epiverse-trace.github.io/finalsize/). For comparison, we produce a figure similar to @endoHeavytailedSexualContact2022, using `calc_network_R()` instead to show how highly connected individuals -- who are more likely to acquire and pass on infection -- alter the estimated $R_0$ compared to the simpler assumption of $R_0 = SAR \times contacts$, under the assumptions described above. ```{r, calc-r-mpox} beta <- seq(0.001, 1, length.out = 1000) duration_years <- 21 / 365 res <- lapply( beta, calc_network_R, mean_num_contact = 10, sd_num_contact = 50, infect_duration = duration_years, age_range = c(18, 44) ) res <- do.call(rbind, res) res <- as.data.frame(cbind(beta, res)) res <- reshape( data = res, varying = c("R", "R_net"), v.names = "R", direction = "long", times = c("R", "R_net"), timevar = "group" ) ``` ```{r, plot-mpox, fig.cap="The reproduction number using the unadjusted and adjusted calculation -- calculated using `calc_network_R()` -- with secondary attack rate on the x-axis and reproduction number ($R_0$) on the y-axis. This plot is similar to Figure 2A from @endoHeavytailedSexualContact2022.", fig.width = 8, fig.height = 5} ggplot(data = res) + geom_line(mapping = aes(x = beta, y = R, colour = group)) + geom_hline(mapping = aes(yintercept = 1)) + scale_y_continuous( name = "Reproduction Number (R)", trans = "log", breaks = breaks_log(), labels = label_comma() ) + scale_x_continuous(name = "Secondary Attack Rate (SAR)") + scale_colour_brewer( name = "Reproduction Number", labels = c("R", "Adjusted R"), palette = "Set1" ) + theme_bw() + theme(legend.position = "top") ``` ::: {.alert .alert-warning} _Methodological caveat_: There is a link with the theory for the main negative binomial models from the [superspreading package](https://epiverse-trace.github.io/superspreading/) here (which have a Gamma distributed mean), because the above Anderson and May formulation requires the 'true' mean and variance of the underlying static contact distribution, rather than the observed mean and variance. To give the superspreading version: in a simple branching process with fixed $R_0$ (i.e. zero variance in the individual-level reproduction number, perhaps because everyone has an identical number of contacts), the Poisson distributed number of transmissions per year measured exhibits more variation than the 'true' $R_0$ (which has zero variance). For the above STI model, taking the lifetime average deals with this problem some extent, because if we look at the sum of contacts over a large number of years, then calculate the mean per year, the observed distribution will converge as the number of years gets very large. ::: ## References