Build matrix of delay distributions through time from empirical delay data. — get_matrix_from_empirical_delay

This function takes a record of delays between events and their observations and builds a discretized delay distribution matrix from this record. The discretized delay distribution matrix is required for the application of the Richardson-Lucy algorithm. The main benefit of providing empirical delay data to an estimateR analysis, as opposed to specifiying a delay as a single distribution (whether a fitted or empirical distribution) is that the variability of the delays through time is used to inform the analysis and provide more accurate estimates. If the average of delays has shifted from 5 days to 3 days between the beginning and end of epidemic of interest, this will be reflected in the recorded empirical delays and will be accounted for by estimateR when estimating the reproductive number.

get_matrix_from_empirical_delay_distr(
  empirical_delays,
  n_report_time_steps,
  ref_date = NULL,
  time_step = "day",
  min_number_cases = NULL,
  upper_quantile_threshold = 0.99,
  min_number_cases_fraction = 0.2,
  min_min_number_cases = 500,
  fit = "none",
  return_fitted_distribution = FALSE,
  num_steps_in_a_unit = NULL
)

Arguments

empirical_delays	dataframe containing the empirical data. See Details.
n_report_time_steps	integer. Length of the incidence time series in the accompanying analysis. This argument is needed to determine the dimensions of the output matrix.
ref_date	Date. Optional. Date of the first data entry in `incidence_data`
time_step	string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see `seq.Date` for details)
min_number_cases	integer. Minimal number of cases to build the empirical distribution from. If `num_steps_in_a_unit` is `NULL`, for any time step T, the `min_number_cases` records prior to T are used. If less than `min_number_cases` delays were recorded before T, then T is ignored and the `min_number_cases` earliest-recorded delays are used. If `num_steps_in_a_unit` is given a value, a similar same procedure is applied, except that, now at least `min_number_cases` must be taken over a round number of time units. For example, if `num_steps_in_a_unit = 7`, and time steps represent consecutive days, to build the distribution for time step T, we find the smallest number of weeks starting from T and going in the past, for which at least `min_number_cases` delays were recorded. We then use all the delays recorded during these weeks. Weeks are not meant as necessarily being Monday to Sunday, but simply 7 days in a row, e.g. it can be Thursday-Wednesday. Again, if less than `min_number_cases` delays were recorded before T, then T is ignored. We then find the minimum number of weeks, starting from the first recorded delay that contains at least `min_number_cases`.
upper_quantile_threshold	numeric. Between 0 and 1. Argument for internal use.
min_number_cases_fraction	numeric. Between 0 and 1. If `min_number_cases` is not provided (kept to `NULL`), the number of most-recent cases used to build the instant delay distribution is `min_number_cases_fraction` times the total number of reported delays.
min_min_number_cases	numeric. Lower bound for number of cases used to build an instant delay distribution.
fit	string. One of "gamma" or "none". Specifies the type of fit that is applied to the columns of the delay matrix
return_fitted_distribution	boolean. If TRUE, the function also returns the gamma distribution that was fitted to the respective column.
num_steps_in_a_unit	Optional argument. Number of time steps in a full time unit (e.g. 7 if looking at weeks). If set, the delays used to build a particular delay distribution will span over a round number of such time units. This option is included for comparison with legacy code.

Value

a discretized delay distribution matrix.

Details

The ref_date argument here will be understood as the date of the first record in the incidence data for which the empirical delay data will be used. If ref_date is not provided, the reference date will be taken as being the earliest date in the event_date column of empirical_delays. In other words, the date of the first record in the incidence data will be assumed to be the same as the date of the first record in the empirical delay data. If this is not the case in your analysis, make sure to specify a ref_date argument.

Examples

## Basic usage of get_matrix_from_empirical_delay_distr
# Obtaining the deconvolved incidence for the full HK incidence data provided in
# the package: obtaining the delay matrix and then using it to recover the 
# deconvolved incidence.

smoothed_incidence <- smooth_incidence(HK_incidence_data$case_incidence)

shape_incubation = 3.2 
scale_incubation = 1.3
delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation)

delay_matrix_1 <- get_matrix_from_empirical_delay_distr(
  HK_delay_data, 
  n_report_time_steps = length(smoothed_incidence)
)

deconvolved_incidence_1 <- deconvolve_incidence(
  incidence_data = smoothed_incidence,
  delay = list(delay_incubation, delay_matrix_1)
)


## Advanced usage of get_matrix_from_empirical_delay_distr
# Obtaining the deconvolved incidence for a section of the HK incidence data 
# provided in the package: computing the delay matrix, fitting gamma distributions 
# to the columns and then using it to recover the deconvolved incidence for the 
# time-frame of interest

smoothed_partial_incidence <- smooth_incidence(HK_incidence_data[30:90,]$case_incidence)

delay_matrix_2 <- get_matrix_from_empirical_delay_distr(
  HK_delay_data, 
  n_report_time_steps = length(smoothed_partial_incidence),
  fit = "gamma",
  ref_date = HK_incidence_data[30,]$date
)

deconvolved_incidence_2 <- deconvolve_incidence(
  incidence_data = smoothed_partial_incidence,
  delay = list(delay_incubation, delay_matrix_2)
)