This function takes as input incidence data of delayed observations of infections events, as well as the probability distribution(s) of the delay(s). It returns an inferred incidence of infection events.

get_infections_from_incidence(
  incidence_data,
  smoothing_method = "LOESS",
  deconvolution_method = "Richardson-Lucy delay distribution",
  delay,
  is_partially_reported_data = FALSE,
  delay_until_final_report = NULL,
  output_infection_incidence_only = TRUE,
  ref_date = NULL,
  time_step = "day",
  ...
)

Arguments

incidence_data

An object containing incidence data through time. It can either be:

  • A list with two elements:

    1. A numeric vector named values: the incidence recorded on consecutive time steps.

    2. An integer named index_offset: the offset, counted in number of time steps, by which the first value in values is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite.

  • A numeric vector. The vector corresponds to the values element descrived above, and index_offset is implicitely zero. This means that the first value in incidence_data is associated with the reference time step (no shift towards the future or past).

smoothing_method

string. Method used to smooth the original incidence data. Available options are:

deconvolution_method

string. Method used to infer timings of infection events from the original incidence data (aka deconvolution step). Available options are:

delay

Single delay or list of delays. Each delay can be one of:

  • a list representing a distribution object

  • a discretized delay distribution vector

  • a discretized delay distribution matrix

  • a dataframe containing empirical delay data

is_partially_reported_data

boolean. Set to TRUE if incidence_data represents delayed observations of infection events that themselves rely on further-delayed observations.

delay_until_final_report

Single delay or list of delays. Each delay can be one of:

  • a list representing a distribution object

  • a discretized delay distribution vector

  • a discretized delay distribution matrix

  • a dataframe containing empirical delay data

output_infection_incidence_only

boolean. Should the output contain only the estimated infection incidence? (as opposed to containing results for intermediary steps)

ref_date

Date. Optional. Date of the first data entry in incidence_data

time_step

string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see seq.Date for details)

...

Arguments passed on to .smooth_LOESS, .deconvolve_incidence_Richardson_Lucy, merge_outputs, nowcast

data_points_incl

integer. Size of the window used in the LOESS algorithm. The span parameter passed to loess is computed as the ratio of data_points_incl and the number of time steps in the input data.

degree

integer. LOESS degree. Must be 0, 1 or 2.

initial_Re_estimate_window

integer. In order to help with the smoothing, the function extends the data back in time, padding with values obtained by assuming a constant Re. This parameter represents the number of timesteps in the beginning of incidence_input to take into account when computing the average initial Re.

delay_distribution

numeric square matrix or vector.

threshold_chi_squared

numeric scalar. Threshold for chi-squared values under which the R-L algorithm stops.

max_iterations

integer. Maximum threshold for the number of iterations in the R-L algorithm.

verbose

Boolean. Print verbose output?

cutoff_observation_probability

value between 0 and 1. Only datapoints for timesteps that have a probability of observing a event higher than cutoff_observation_probability are kept. The few datapoints with a lower probability to be observed are trimmed off the tail of the timeseries.

gap_to_present

Integer. Default value: 0. Number of time steps truncated off from the right tail of the raw incidence data. See Details for more details.

Value

Time series of infections through time. If ref_date is provided then a date column is included with the output.

Details

This function can account for the observations being dependent on future delayed observations, with the is_partially_reported_data flag. For instance, if the incidence data represents symptom onset events, usually these events are dependent on a secondary delayed observation: a case confirmation typically, or a hospital admission or any other type of event. When setting is_partially_reported_data to TRUE, use the delay_until_final_report argument to specify the delay from infection until this secondary delayed observation.

Examples

## Basic usage of get_infections_from_incidence # Recovering infection events from case incidence data assuming distinct gamma # distributions for the delay between infection and symptom onset, and the delay # between symptom onset and case reporting. shape_incubation = 3.2 scale_incubation = 1.3 delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation) shape_onset_to_report = 2.7 scale_onset_to_report = 1.6 delay_onset_to_report <- list(name="gamma", shape = shape_onset_to_report, scale = scale_onset_to_report) infections_1 <- get_infections_from_incidence( HK_incidence_data$case_incidence, delay = list(delay_incubation, delay_onset_to_report) ) ## Advanced usage of get_infections_from_incidence # Recovering infection events from symptom onset data, assuming the same delay # distributions as above infections_2 <- get_infections_from_incidence( HK_incidence_data$onset_incidence, delay = delay_incubation, is_partially_reported_data = TRUE, delay_until_final_report = delay_onset_to_report, ref_date = HK_incidence_data$date[1] )