Convolve delay distributions — convolve

Take a list of delay distributions and return their convolution. The convolution of a delay A -> B and a delay B -> C corresponds to the delay distribution of A -> C. Delays are assumed to happen in the same chronological order as the order they are given in in the delays list.

convolve_delays(delays, n_report_time_steps = NULL, ...)

Arguments

delays

delays	List of delays, with flexible structure. Each delay in the `delays` list can be one of: a list representing a distribution object a discretized delay distribution vector a discretized delay distribution matrix a dataframe containing empirical delay data
n_report_time_steps	integer. Length of incidence time series. Use only when providing empirical delay data.
...	Arguments passed on to `get_matrix_from_empirical_delay_distr`, `build_delay_distribution` `min_number_cases` integer. Minimal number of cases to build the empirical distribution from. If `num_steps_in_a_unit` is `NULL`, for any time step T, the `min_number_cases` records prior to T are used. If less than `min_number_cases` delays were recorded before T, then T is ignored and the `min_number_cases` earliest-recorded delays are used. If `num_steps_in_a_unit` is given a value, a similar same procedure is applied, except that, now at least `min_number_cases` must be taken over a round number of time units. For example, if `num_steps_in_a_unit = 7`, and time steps represent consecutive days, to build the distribution for time step T, we find the smallest number of weeks starting from T and going in the past, for which at least `min_number_cases` delays were recorded. We then use all the delays recorded during these weeks. Weeks are not meant as necessarily being Monday to Sunday, but simply 7 days in a row, e.g. it can be Thursday-Wednesday. Again, if less than `min_number_cases` delays were recorded before T, then T is ignored. We then find the minimum number of weeks, starting from the first recorded delay that contains at least `min_number_cases`. `upper_quantile_threshold` numeric. Between 0 and 1. Argument for internal use. `min_number_cases_fraction` numeric. Between 0 and 1. If `min_number_cases` is not provided (kept to `NULL`), the number of most-recent cases used to build the instant delay distribution is `min_number_cases_fraction` times the total number of reported delays. `min_min_number_cases` numeric. Lower bound for number of cases used to build an instant delay distribution. `fit` string. One of "gamma" or "none". Specifies the type of fit that is applied to the columns of the delay matrix `num_steps_in_a_unit` Optional argument. Number of time steps in a full time unit (e.g. 7 if looking at weeks). If set, the delays used to build a particular delay distribution will span over a round number of such time units. This option is included for comparison with legacy code. `ref_date` Date. Optional. Date of the first data entry in `incidence_data` `time_step` string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see `seq.Date` for details) `max_quantile` numeric value between 0 and 1. Upper quantile reached by the last element in the discretized distribution vector. `offset_by_one` boolean. Set to TRUE if `distribution` represents the fit of data that was offset by one (`fitted_data = original_data + 1`) to accommodate zeroes in `original_data`.

List of delays, with flexible structure. Each delay in the delays list can be one of:

a list representing a distribution object
a discretized delay distribution vector
a discretized delay distribution matrix
a dataframe containing empirical delay data

n_report_time_steps

integer. Length of incidence time series. Use only when providing empirical delay data.

...

Arguments passed on to get_matrix_from_empirical_delay_distr, build_delay_distribution

min_number_cases: integer. Minimal number of cases to build the empirical distribution from. If num_steps_in_a_unit is NULL, for any time step T, the min_number_cases records prior to T are used. If less than min_number_cases delays were recorded before T, then T is ignored and the min_number_cases earliest-recorded delays are used. If num_steps_in_a_unit is given a value, a similar same procedure is applied, except that, now at least min_number_cases must be taken over a round number of time units. For example, if num_steps_in_a_unit = 7, and time steps represent consecutive days, to build the distribution for time step T, we find the smallest number of weeks starting from T and going in the past, for which at least min_number_cases delays were recorded. We then use all the delays recorded during these weeks. Weeks are not meant as necessarily being Monday to Sunday, but simply 7 days in a row, e.g. it can be Thursday-Wednesday. Again, if less than min_number_cases delays were recorded before T, then T is ignored. We then find the minimum number of weeks, starting from the first recorded delay that contains at least min_number_cases.
upper_quantile_threshold: numeric. Between 0 and 1. Argument for internal use.
min_number_cases_fraction: numeric. Between 0 and 1. If min_number_cases is not provided (kept to NULL), the number of most-recent cases used to build the instant delay distribution is min_number_cases_fraction times the total number of reported delays.
min_min_number_cases: numeric. Lower bound for number of cases used to build an instant delay distribution.
fit: string. One of "gamma" or "none". Specifies the type of fit that is applied to the columns of the delay matrix
num_steps_in_a_unit: Optional argument. Number of time steps in a full time unit (e.g. 7 if looking at weeks). If set, the delays used to build a particular delay distribution will span over a round number of such time units. This option is included for comparison with legacy code.
ref_date: Date. Optional. Date of the first data entry in incidence_data
time_step: string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see seq.Date for details)
max_quantile: numeric value between 0 and 1. Upper quantile reached by the last element in the discretized distribution vector.
offset_by_one: boolean. Set to TRUE if distribution represents the fit of data that was offset by one (fitted_data = original_data + 1) to accommodate zeroes in original_data.

Value

a discretized delay distribution vector or matrix. A vector is returned when input delay distributions are constant through time: either they are vectors already or in the form of a list-specified distribution. A matrix is returned when at least one of the delays has a delay distribution that can change through time. This is the case with empirical delay data or if any of the input is already a delay distribution matrix.

Details

This function is flexible in the type of delay inputs it can handle. Each delay in the delays list can be one of:

a list representing a distribution object
a discretized delay distribution vector
a discretized delay distribution matrix
a dataframe containing empirical delay data

see get_matrix_from_empirical_delay_distr for details on the format expected for the empirical delay data.

Examples

## Convolving the delay between infection and onset of symptoms with the delay
# between onset of symptoms and case report to obtain the final delay between
# infection and case report. Using the resulting delay distribution to recover
# the original infection events

smoothed_incidence <- smooth_incidence(HK_incidence_data$case_incidence)

shape_incubation = 3.2
scale_incubation = 1.3
delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation)

shape_onset_to_report = 2.7
scale_onset_to_report = 1.6
delay_onset_to_report <- list(name="gamma",
                              shape = shape_onset_to_report,
                              scale = scale_onset_to_report)

total_delay_1 <- convolve_delays(list(delay_incubation, delay_onset_to_report))

deconvolved_incidence <- deconvolve_incidence(
  incidence_data = smoothed_incidence,
  delay = total_delay_1
)

## Convolving multiple delays
# In this example it is assumed that the delay between infection and case report
# is composed of three delays; the delay between infection and symptom onset,
# the delay between symptom onset and case testing

shape_incubation = 3.2
scale_incubation = 1.3
delay_incubation <- list(name="gamma", shape = shape_incubation, scale = scale_incubation)

delay_onset_to_test_taken <- list(name = "norm", mean = 5, sd = 2)

delay_test_to_report <- list(name="norm", mean = 2, sd = 0.5)

total_delay_2 <- convolve_delays(list(delay_incubation,
                                      delay_onset_to_test_taken,
                                      delay_test_to_report))

## Convolving delays of multiple types
# Defining the incubation period as a probability vector, and the delay between
# symptom onset and case observation as a delay matrix

delay_incubation <- c(0.01, 0.1, 0.15, 0.18, 0.17, 0.14, 0.11, 0.07, 0.035, 0.020, 0.015)

delay_matrix <- get_matrix_from_empirical_delay_distr(
  HK_delay_data,
  n_report_time_steps = 50
)

total_delay_3 <- convolve_delays(list(delay_incubation, delay_matrix))