This function takes a record of delays between events and their observations and builds a discretized delay distribution matrix from this record. The discretized delay distribution matrix is required for the application of the Richardson-Lucy algorithm. The main benefit of providing empirical delay data to an estimateR analysis, as opposed to specifiying a delay as a single distribution (whether a fitted or empirical distribution) is that the variability of the delays through time is used to inform the analysis and provide more accurate estimates. If the average of delays has shifted from 5 days to 3 days between the beginning and end of epidemic of interest, this will be reflected in the recorded empirical delays and will be accounted for by estimateR when estimating the reproductive number.

get_matrix_from_empirical_delay_distr(
  empirical_delays,
  n_report_time_steps,
  ref_date = NULL,
  time_step = "day",
  min_number_cases = NULL,
  upper_quantile_threshold = 0.99,
  min_number_cases_fraction = 0.2,
  min_min_number_cases = 500,
  fit = "none",
  return_fitted_distribution = FALSE,
  num_steps_in_a_unit = NULL
)

Arguments

empirical_delays

dataframe containing the empirical data. See Details.

n_report_time_steps

integer. Length of the incidence time series in the accompanying analysis. This argument is needed to determine the dimensions of the output matrix.

ref_date

Date. Optional. Date of the first data entry in incidence_data

time_step

string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see seq.Date for details)

min_number_cases

integer. Minimal number of cases to build the empirical distribution from. If num_steps_in_a_unit is NULL, for any time step T, the min_number_cases records prior to T are used. If less than min_number_cases delays were recorded before T, then T is ignored and the min_number_cases earliest-recorded delays are used. If num_steps_in_a_unit is given a value, a similar same procedure is applied, except that, now at least min_number_cases must be taken over a round number of time units. For example, if num_steps_in_a_unit = 7, and time steps represent consecutive days, to build the distribution for time step T, we find the smallest number of weeks starting from T and going in the past, for which at least min_number_cases delays were recorded. We then use all the delays recorded during these weeks. Weeks are not meant as necessarily being Monday to Sunday, but simply 7 days in a row, e.g. it can be Thursday-Wednesday. Again, if less than min_number_cases delays were recorded before T, then T is ignored. We then find the minimum number of weeks, starting from the first recorded delay that contains at least min_number_cases.

upper_quantile_threshold

numeric. Between 0 and 1. Argument for internal use.

min_number_cases_fraction

numeric. Between 0 and 1. If min_number_cases is not provided (kept to NULL), the number of most-recent cases used to build the instant delay distribution is min_number_cases_fraction times the total number of reported delays.

min_min_number_cases

numeric. Lower bound for number of cases used to build an instant delay distribution.

fit

string. One of "gamma" or "none". Specifies the type of fit that is applied to the columns of the delay matrix

return_fitted_distribution

boolean. If TRUE, the function also returns the gamma distribution that was fitted to the respective column.

num_steps_in_a_unit

Optional argument. Number of time steps in a full time unit (e.g. 7 if looking at weeks). If set, the delays used to build a particular delay distribution will span over a round number of such time units. This option is included for comparison with legacy code.

Value

a discretized delay distribution matrix.

Details

The ref_date argument here will be understood as the date of the first record in the incidence data for which the empirical delay data will be used. If ref_date is not provided, the reference date will be taken as being the earliest date in the event_date column of empirical_delays. In other words, the date of the first record in the incidence data will be assumed to be the same as the date of the first record in the empirical delay data. If this is not the case in your analysis, make sure to specify a ref_date argument.

Examples

## Basic usage of get_matrix_from_empirical_delay_distr # Obtaining the deconvolved incidence for the full HK incidence data provided in # the package: obtaining the delay matrix and then using it to recover the # deconvolved incidence. smoothed_incidence <- smooth_incidence(HK_incidence_data$case_incidence) shape_incubation = 3.2 scale_incubation = 1.3 delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation) delay_matrix_1 <- get_matrix_from_empirical_delay_distr( HK_delay_data, n_report_time_steps = length(smoothed_incidence) ) deconvolved_incidence_1 <- deconvolve_incidence( incidence_data = smoothed_incidence, delay = list(delay_incubation, delay_matrix_1) ) ## Advanced usage of get_matrix_from_empirical_delay_distr # Obtaining the deconvolved incidence for a section of the HK incidence data # provided in the package: computing the delay matrix, fitting gamma distributions # to the columns and then using it to recover the deconvolved incidence for the # time-frame of interest smoothed_partial_incidence <- smooth_incidence(HK_incidence_data[30:90,]$case_incidence) delay_matrix_2 <- get_matrix_from_empirical_delay_distr( HK_delay_data, n_report_time_steps = length(smoothed_partial_incidence), fit = "gamma", ref_date = HK_incidence_data[30,]$date ) deconvolved_incidence_2 <- deconvolve_incidence( incidence_data = smoothed_partial_incidence, delay = list(delay_incubation, delay_matrix_2) )