Estimate Re from delayed observations of infection events. — estimate_from_combined

This function allows for combining two different incidence time series, see Details . The two timeseries can represent events that are differently delayed from the original infection events. The two data sources must not have any overlap in the events recorded. The function can account for the one of the two types of events to require the future observation of the other type of event. For instance, one type can be events of symptom onset, and the other be case confirmation. Typically, the recording of a symptom onset event will require a future case confirmation. If so, the partial_observation_requires_full_observation flag should be set to TRUE.

estimate_from_combined_observations(
  partially_delayed_incidence,
  fully_delayed_incidence,
  smoothing_method = "LOESS",
  deconvolution_method = "Richardson-Lucy delay distribution",
  estimation_method = "EpiEstim sliding window",
  delay_until_partial,
  delay_until_final_report,
  partial_observation_requires_full_observation = TRUE,
  ref_date = NULL,
  time_step = "day",
  output_Re_only = TRUE,
  ...
)

Arguments

partially_delayed_incidence	An object containing incidence data through time. It can be: A list with two elements: A numeric vector named `values`: the incidence recorded on consecutive time steps. An integer named `index_offset`: the offset, counted in number of time steps, by which the first value in `values` is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in `values` without needing to carry a `date` column around. A positive offset means `values` are delayed in the future compared to the reference values. A negative offset means the opposite. A numeric vector. The vector corresponds to the `values` element descrived above, and `index_offset` is implicitely zero. This means that the first value in `incidence_data` is associated with the reference time step (no shift towards the future or past).
fully_delayed_incidence	An object containing incidence data through time. It can be: A list with two elements: A numeric vector named `values`: the incidence recorded on consecutive time steps. An integer named `index_offset`: the offset, counted in number of time steps, by which the first value in `values` is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in `values` without needing to carry a `date` column around. A positive offset means `values` are delayed in the future compared to the reference values. A negative offset means the opposite. A numeric vector. The vector corresponds to the `values` element descrived above, and `index_offset` is implicitely zero. This means that the first value in `incidence_data` is associated with the reference time step (no shift towards the future or past).
smoothing_method	string. Method used to smooth the original incidence data. Available options are: 'LOESS', implemented in `.smooth_LOESS`
deconvolution_method	string. Method used to infer timings of infection events from the original incidence data (aka deconvolution step). Available options are: 'Richardson-Lucy delay distribution', implemented in `.deconvolve_incidence_Richardson_Lucy`
estimation_method	string. Method used to estimate reproductive number values through time from the reconstructed infection timings. Available options are: 'EpiEstim sliding window', implemented in `.estimate_Re_EpiEstim_sliding_window` 'EpiEstim piecewise constant', implemented in `.estimate_Re_EpiEstim_piecewise_constant`
delay_until_partial	Single delay or list of delays. Each delay can be one of: a list representing a distribution object a discretized delay distribution vector a discretized delay distribution matrix a dataframe containing empirical delay data
delay_until_final_report	Single delay or list of delays. Each delay can be one of: a list representing a distribution object a discretized delay distribution vector a discretized delay distribution matrix a dataframe containing empirical delay data
partial_observation_requires_full_observation	boolean Set to `TRUE` if `partially_delayed_incidence` represent delayed observations of infection events that themselves rely on further-delayed observations. See Details for more details.
ref_date	Date. Optional. Date of the first data entry in `incidence_data`
time_step	string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see `seq.Date` for details)
output_Re_only	boolean. Should the output only contain Re estimates? (as opposed to containing results for each intermediate step)
...	Arguments passed on to `.smooth_LOESS`, `.deconvolve_incidence_Richardson_Lucy`, `.estimate_Re_EpiEstim_sliding_window`, `.estimate_Re_EpiEstim_piecewise_constant`, `merge_outputs`, `nowcast` `data_points_incl` integer. Size of the window used in the LOESS algorithm. The `span` parameter passed to `loess` is computed as the ratio of `data_points_incl` and the number of time steps in the input data. `degree` integer. LOESS degree. Must be 0, 1 or 2. `initial_Re_estimate_window` integer. In order to help with the smoothing, the function extends the data back in time, padding with values obtained by assuming a constant Re. This parameter represents the number of timesteps in the beginning of `incidence_input` to take into account when computing the average initial Re. `delay_distribution` numeric square matrix or vector. `threshold_chi_squared` numeric scalar. Threshold for chi-squared values under which the R-L algorithm stops. `max_iterations` integer. Maximum threshold for the number of iterations in the R-L algorithm. `verbose` Boolean. Print verbose output? `estimation_window` Use with `estimation_method = "EpiEstim sliding window"` Positive integer value. Number of data points over which to assume Re to be constant. `import_incidence_input` NULL or module input object. List with two elements: A numeric vector named `values`: the incidence recorded on consecutive time steps. An integer named `index_offset`: the offset, counted in number of time steps, by which the first value in `values` is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in `values` without needing to carry a `date` column around. A positive offset means `values` are delayed in the future compared to the reference values. A negative offset means the opposite. If not NULL, this data represents recorded imported cases. And then `incidence_input` represents only local cases. `minimum_cumul_incidence` Numeric value. Minimum number of cumulated infections before starting the Re estimation. Default is `12` as recommended in Cori et al., 2013. `mean_serial_interval` Numeric positive value. `mean_si` for `estimate_R` `std_serial_interval` Numeric positive value. `std_si` for `estimate_R` `mean_Re_prior` Numeric positive value. `mean prior` for `estimate_R` `output_HPD` Boolean. If TRUE, return the highest posterior density interval with the output. `interval_ends` Use with `estimation_method = "EpiEstim piecewise constant"` Integer vector. Optional argument. If provided, `interval_ends` overrides the `interval_length` argument. Each element of `interval_ends` specifies the right boundary of an interval over which Re is assumed to be constant for the calculation. Values in `interval_ends` must be integer values corresponding with the same numbering of time steps as given by `incidence_input`. In other words, `interval_ends` and `incidence_input`, use the same time step as the zero-th time step. `interval_length` Use with `estimation_method = "EpiEstim piecewise constant"` Positive integer value. Re is assumed constant over steps of size `interval_length`. `cutoff_observation_probability` value between 0 and 1. Only datapoints for timesteps that have a probability of observing a event higher than `cutoff_observation_probability` are kept. The few datapoints with a lower probability to be observed are trimmed off the tail of the timeseries. `gap_to_present` Integer. Default value: 0. Number of time steps truncated off from the right tail of the raw incidence data. See Details for more details.

Value

Effective reproductive estimates through time. If output_Re_only is FALSE, then transformations made on the input observations during calculations are output as well.

Details

With this function, one can specify two types of delayed observations of infection events (in the same epidemic). The two incidence records are passed with the partially_delayed_incidence and fully_delayed_incidence. These two types of delayed observations must not overlap with one another: a particular infection event should not be recorded in both time series.

If the two sets of observations are completely independent from one another, meaning that they represents two different ways infection events can be observed, with two different delays then set partial_observation_requires_full_observation to FALSE. Note that a particular infection events should NOT be recorded twice: it cannot be recorded both in partially_delayed_incidence and in fully_delayed_incidence.

An alternative use-case is when the two sets of observations are not independent from one another. For instance, if to record a "partially-delayed" event, one had to wait to record it as a "fully-delayed" event first. A typical example of this occurs when recording symptom onset events: in most cases, you must first wait until a case is confirmed via a positive test result to learn about the symptom onset event (assuming the case was symptomatic in the first place). But you typically do not have the date of onset of symptoms for all cases confirmed (even assumed they were all symptomatic cases). In such a case, we set the partial_observation_requires_full_observation flag to TRUE and we call the incidence constructed from events of symptom onset partially_delayed_incidence and the incidence constructed from case confirmation events fully_delayed_incidence. The delay from infection to symptom onset events is specified with the delay_until_partial argument. The delay from symptom onset to positive test in this example is specified with the delay_until_final_report argument. Note that, for a particular patient, if the date of onset of symptom is known, the patient must not be counted again in the incidence of case confirmation. Otherwise, the infection event would have been counted twice.

Examples

shape_incubation = 3.2
scale_incubation = 1.3
delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation)

shape_onset_to_report = 2.7
scale_onset_to_report = 1.6
delay_onset_to_report <- list(name="gamma",
                              shape = shape_onset_to_report,
                              scale = scale_onset_to_report)


## Basic usage of estimate_from_combined_observations
Re_estimate_1 <- estimate_from_combined_observations(
  partially_delayed_incidence = HK_incidence_data$onset_incidence,
  fully_delayed_incidence = HK_incidence_data$report_incidence,
  partial_observation_requires_full_observation = TRUE,
  delay_until_partial = delay_incubation,
  delay_until_final_report = delay_onset_to_report
)


## Advanced usage of estimate_from_combined_observations

# Getting a more verbose result. Adding a date column and returning intermediate
# results as well as the Re estimate.
Re_estimate_2 <- estimate_from_combined_observations(
  partially_delayed_incidence = HK_incidence_data$onset_incidence,
  fully_delayed_incidence = HK_incidence_data$report_incidence,
  partial_observation_requires_full_observation = TRUE,
  delay_until_partial = delay_incubation,
  delay_until_final_report = delay_onset_to_report,
  ref_date = HK_incidence_data$date[1],
  output_Re_only = FALSE
)

# Incorporating prior knowledge over Re. Here, Re is assumed constant over a time
# frame of one week, with a prior mean of 1.25.
Re_estimate_3 <- estimate_from_combined_observations(
  partially_delayed_incidence = HK_incidence_data$onset_incidence,
  fully_delayed_incidence = HK_incidence_data$report_incidence,
  partial_observation_requires_full_observation = TRUE,
  delay_until_partial = delay_incubation,
  delay_until_final_report = delay_onset_to_report,
  estimation_method = 'EpiEstim piecewise constant',
  interval_length = 7,
  mean_Re_prior = 1.25
)

# Incorporating prior knowledge over the disease. Here, the mean of the serial
# interval is assumed to be 5 days, and the standard deviation is assumed to be
# 2.5 days.
Re_estimate_4 <- estimate_from_combined_observations(
  partially_delayed_incidence = HK_incidence_data$onset_incidence,
  fully_delayed_incidence = HK_incidence_data$report_incidence,
  partial_observation_requires_full_observation = TRUE,
  delay_until_partial = delay_incubation,
  delay_until_final_report = delay_onset_to_report,
  mean_serial_interval = 5,
  std_serial_interval = 2.5
)