An estimation of the effective reproductive number through time is made on the original incidence data. Then, the same estimation is performed on a number of bootstrap samples built from the original incidence data. The estimate on the original data is output along with confidence interval boundaries built from the distribution of bootstrapped estimates.

get_bootstrapped_estimates_from_combined_observations(
  partially_delayed_incidence,
  fully_delayed_incidence,
  smoothing_method = "LOESS",
  deconvolution_method = "Richardson-Lucy delay distribution",
  estimation_method = "EpiEstim sliding window",
  bootstrapping_method = "non-parametric block boostrap",
  uncertainty_summary_method = "original estimate - CI from bootstrap estimates",
  combine_bootstrap_and_estimation_uncertainties = FALSE,
  N_bootstrap_replicates = 100,
  delay_until_partial,
  delay_until_final_report,
  partial_observation_requires_full_observation = TRUE,
  ref_date = NULL,
  time_step = "day",
  output_Re_only = TRUE,
  ...
)

Arguments

partially_delayed_incidence

An object containing incidence data through time. It can be:

  • A list with two elements:

    1. A numeric vector named values: the incidence recorded on consecutive time steps.

    2. An integer named index_offset: the offset, counted in number of time steps, by which the first value in values is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite.

  • A numeric vector. The vector corresponds to the values element descrived above, and index_offset is implicitely zero. This means that the first value in incidence_data is associated with the reference time step (no shift towards the future or past).

fully_delayed_incidence

An object containing incidence data through time. It can be:

  • A list with two elements:

    1. A numeric vector named values: the incidence recorded on consecutive time steps.

    2. An integer named index_offset: the offset, counted in number of time steps, by which the first value in values is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite.

  • A numeric vector. The vector corresponds to the values element descrived above, and index_offset is implicitely zero. This means that the first value in incidence_data is associated with the reference time step (no shift towards the future or past).

smoothing_method

string. Method used to smooth the original incidence data. Available options are:

deconvolution_method

string. Method used to infer timings of infection events from the original incidence data (aka deconvolution step). Available options are:

estimation_method

string. Method used to estimate reproductive number values through time from the reconstructed infection timings. Available options are:

bootstrapping_method

string. Method to perform bootstrapping of the original incidence data. Available options are:

uncertainty_summary_method

string. One of the following options:

  • 'NONE' if no summary of bootstrap estimates is required

  • 'original estimate - CI from bootstrap estimates'. The confidence interval is built using bootstrapped estimates and centered around the original estimates.

  • 'bagged mean - CI from bootstrap estimates'. The confidence interval is built using bootstrapped estimates and centered around the mean of bootstrapped estimates and original estimates.

combine_bootstrap_and_estimation_uncertainties

boolean. If TRUE, the uncertainty interval reported is the union of the highest posterior density interval from the Re estimation with the confidence interval from the boostrapping of time series of observations.

N_bootstrap_replicates

integer. Number of bootstrap samples.

delay_until_partial

Single delay or list of delays. Each delay can be one of:

  • a list representing a distribution object

  • a discretized delay distribution vector

  • a discretized delay distribution matrix

  • a dataframe containing empirical delay data

delay_until_final_report

Single delay or list of delays. Each delay can be one of:

  • a list representing a distribution object

  • a discretized delay distribution vector

  • a discretized delay distribution matrix

  • a dataframe containing empirical delay data

partial_observation_requires_full_observation

boolean Set to TRUE if partially_delayed_incidence represent delayed observations of infection events that themselves rely on further-delayed observations. See Details for more details.

ref_date

Date. Optional. Date of the first data entry in incidence_data

time_step

string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see seq.Date for details)

output_Re_only

boolean. Should the output only contain Re estimates? (as opposed to containing results for each intermediate step)

...

Arguments passed on to .smooth_LOESS, .deconvolve_incidence_Richardson_Lucy, .estimate_Re_EpiEstim_sliding_window, .estimate_Re_EpiEstim_piecewise_constant, merge_outputs, nowcast

data_points_incl

integer. Size of the window used in the LOESS algorithm. The span parameter passed to loess is computed as the ratio of data_points_incl and the number of time steps in the input data.

degree

integer. LOESS degree. Must be 0, 1 or 2.

initial_Re_estimate_window

integer. In order to help with the smoothing, the function extends the data back in time, padding with values obtained by assuming a constant Re. This parameter represents the number of timesteps in the beginning of incidence_input to take into account when computing the average initial Re.

delay_distribution

numeric square matrix or vector.

threshold_chi_squared

numeric scalar. Threshold for chi-squared values under which the R-L algorithm stops.

max_iterations

integer. Maximum threshold for the number of iterations in the R-L algorithm.

verbose

Boolean. Print verbose output?

estimation_window

Use with estimation_method = "EpiEstim sliding window" Positive integer value. Number of data points over which to assume Re to be constant.

import_incidence_input

NULL or module input object. List with two elements:

  1. A numeric vector named values: the incidence recorded on consecutive time steps.

  2. An integer named index_offset: the offset, counted in number of time steps, by which the first value in values is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite.

If not NULL, this data represents recorded imported cases. And then incidence_input represents only local cases.

minimum_cumul_incidence

Numeric value. Minimum number of cumulated infections before starting the Re estimation. Default is 12 as recommended in Cori et al., 2013.

mean_serial_interval

Numeric positive value. mean_si for estimate_R

std_serial_interval

Numeric positive value. std_si for estimate_R

mean_Re_prior

Numeric positive value. mean prior for estimate_R

output_HPD

Boolean. If TRUE, return the highest posterior density interval with the output.

interval_ends

Use with estimation_method = "EpiEstim piecewise constant" Integer vector. Optional argument. If provided, interval_ends overrides the interval_length argument. Each element of interval_ends specifies the right boundary of an interval over which Re is assumed to be constant for the calculation. Values in interval_ends must be integer values corresponding with the same numbering of time steps as given by incidence_input. In other words, interval_ends and incidence_input, use the same time step as the zero-th time step.

interval_length

Use with estimation_method = "EpiEstim piecewise constant" Positive integer value. Re is assumed constant over steps of size interval_length.

cutoff_observation_probability

value between 0 and 1. Only datapoints for timesteps that have a probability of observing a event higher than cutoff_observation_probability are kept. The few datapoints with a lower probability to be observed are trimmed off the tail of the timeseries.

gap_to_present

Integer. Default value: 0. Number of time steps truncated off from the right tail of the raw incidence data. See Details for more details.

Value

Effective reproductive estimates through time with confidence interval boundaries. If output_Re_only is FALSE, then transformations made on the input observations during calculations are output as well.

Details

This function allows for combining two different incidence timeseries. The two timeseries can represent events that are differently delayed from the original infection events. The two data sources must not have any overlap in the events recorded. The function can account for the one of the two types of events to require the future observation of the other type of event. For instance, one type can be events of symptom onset, and the other be case confirmation. Typically, the recording of a symptom onset event will require a future case confirmation. If so, the partial_observation_requires_full_observation flag should be set to TRUE.

Examples

## Basic usage of get_bootstrapped_estimates_from_combined_observations # (Only 10 bootstrap replicates are generated to keep the code fast. In practice, # use more.) shape_incubation = 3.2 scale_incubation = 1.3 delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation) shape_onset_to_report = 2.7 scale_onset_to_report = 1.6 delay_onset_to_report <- list(name="gamma", shape = shape_onset_to_report, scale = scale_onset_to_report) Re_estimate_1 <- get_bootstrapped_estimates_from_combined_observations( partially_delayed_incidence = HK_incidence_data$onset_incidence, fully_delayed_incidence = HK_incidence_data$report_incidence, partial_observation_requires_full_observation = TRUE, delay_until_partial = delay_incubation, delay_until_final_report = delay_onset_to_report, N_bootstrap_replicates = 10 ) ## Advanced usage of get_bootstrapped_estimates_from_combined_observations # Incorporating prior knowledge over Re. Here, Re is assumed constant over a time # frame of one week, with a prior mean of 1.25. Re_estimate_2 <- get_bootstrapped_estimates_from_combined_observations( partially_delayed_incidence = HK_incidence_data$onset_incidence, fully_delayed_incidence = HK_incidence_data$report_incidence, partial_observation_requires_full_observation = TRUE, delay_until_partial = delay_incubation, delay_until_final_report = delay_onset_to_report, N_bootstrap_replicates = 10, estimation_method = 'EpiEstim piecewise constant', interval_length = 7, mean_Re_prior = 1.25, ref_date = HK_incidence_data$date[1] ) # Incorporating prior knowledge over the disease. Here, we assume the mean of the # serial interval to be 5 days, and the deviation is assumed to be 2.5 days. The # delay between symptom onset and case confirmation is passed as empirical data. Re_estimate_3 <- get_bootstrapped_estimates_from_combined_observations( partially_delayed_incidence = HK_incidence_data$onset_incidence, fully_delayed_incidence = HK_incidence_data$report_incidence, partial_observation_requires_full_observation = TRUE, delay_until_partial = delay_incubation, delay_until_final_report = delay_onset_to_report, N_bootstrap_replicates = 10, mean_serial_interval = 5, std_serial_interval = 2.5 )