Use this function to correct the tail of an incidence time series if incidence was collected following a subsequent observation event. For instance, if the incidence represents people starting to show symptoms of a disease (dates of onset of symptoms), the data would typically have been collected among individuals whose case was confirmed via a test. If so, among all events of onset of symptoms, only those who had time to be confirmed by a test were reported. Thus, close to the present, there is an under-reporting of onset of symptoms events. In order to account for this effect, this function divides each incidence value by the probability of an event happening at a particular time step to have been observed. Typically, this correction only affects the few most recent data points.

nowcast(
  incidence_data,
  delay_until_final_report,
  cutoff_observation_probability = 0.33,
  gap_to_present = 0,
  ref_date = NULL,
  time_step = "day",
  ...
)

Arguments

incidence_data

An object containing incidence data through time. It can either be:

  • A list with two elements:

    1. A numeric vector named values: the incidence recorded on consecutive time steps.

    2. An integer named index_offset: the offset, counted in number of time steps, by which the first value in values is shifted compared to a reference time step This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite.

  • A numeric vector. The vector corresponds to the values element descrived above, and index_offset is implicitely zero. This means that the first value in incidence_data is associated with the reference time step (no shift towards the future or past).

delay_until_final_report

Single delay or list of delays. Each delay can be one of:

  • a list representing a distribution object

  • a discretized delay distribution vector

  • a discretized delay distribution matrix

  • a dataframe containing empirical delay data

cutoff_observation_probability

value between 0 and 1. Only datapoints for timesteps that have a probability of observing a event higher than cutoff_observation_probability are kept. The few datapoints with a lower probability to be observed are trimmed off the tail of the timeseries.

gap_to_present

Integer. Default value: 0. Number of time steps truncated off from the right tail of the raw incidence data. See Details for more details.

ref_date

Date. Optional. Date of the first data entry in incidence_data

time_step

string. Time between two consecutive incidence datapoints. "day", "2 days", "week", "year"... (see seq.Date for details)

...

Arguments passed on to get_matrix_from_empirical_delay_distr

min_number_cases

integer. Minimal number of cases to build the empirical distribution from. If num_steps_in_a_unit is NULL, for any time step T, the min_number_cases records prior to T are used. If less than min_number_cases delays were recorded before T, then T is ignored and the min_number_cases earliest-recorded delays are used. If num_steps_in_a_unit is given a value, a similar same procedure is applied, except that, now at least min_number_cases must be taken over a round number of time units. For example, if num_steps_in_a_unit = 7, and time steps represent consecutive days, to build the distribution for time step T, we find the smallest number of weeks starting from T and going in the past, for which at least min_number_cases delays were recorded. We then use all the delays recorded during these weeks. Weeks are not meant as necessarily being Monday to Sunday, but simply 7 days in a row, e.g. it can be Thursday-Wednesday. Again, if less than min_number_cases delays were recorded before T, then T is ignored. We then find the minimum number of weeks, starting from the first recorded delay that contains at least min_number_cases.

upper_quantile_threshold

numeric. Between 0 and 1. Argument for internal use.

min_number_cases_fraction

numeric. Between 0 and 1. If min_number_cases is not provided (kept to NULL), the number of most-recent cases used to build the instant delay distribution is min_number_cases_fraction times the total number of reported delays.

min_min_number_cases

numeric. Lower bound for number of cases used to build an instant delay distribution.

fit

string. One of "gamma" or "none". Specifies the type of fit that is applied to the columns of the delay matrix

num_steps_in_a_unit

Optional argument. Number of time steps in a full time unit (e.g. 7 if looking at weeks). If set, the delays used to build a particular delay distribution will span over a round number of such time units. This option is included for comparison with legacy code.

Value

A list with two elements:

  1. A numeric vector named values: the result of the computations on the input data.

  2. An integer named index_offset: the offset, counted in number of time steps, by which the result is shifted compared to an index_offset of 0. This parameter allows one to keep track of the date of the first value in values without needing to carry a date column around. A positive offset means values are delayed in the future compared to the reference values. A negative offset means the opposite. Note that the index_offset of the output of the function call accounts for the (optional) index_offset of the input.

If index_offset is 0 and simplify_output = TRUE, the index_offset is dropped and the values element is returned as a numeric vector.

Details

A trimming is done at the tail of the time series to avoid correcting for time steps for which the observation probability is too low, which could result in too uncertain corrected values. This trimming is tuned via the cutoff_observation_probability argument.

The gap_to_present represents the number of time steps truncated off on the right end of the raw data. If no truncation was done, gap_to_present should be kept at its default value of 0. A truncation can be done when latest reported numbers are too unreliable, e.g. in a monitoring situation the latest X days of data can be deemed not worth keeping if they are not well consolidated. An alternative to this truncation is actually to nowcast the observed incidence using this function and a delay distribution representing the consolidation delay. Contrary to best-practice nowcasting methods, this function only provides a maximum-likelihood estimator of the acual incidence, it does not include uncertainty around this estimator.

The ref_date argument is only needed if the delay_until_final_report is passed as a dataframe of individual delay observations (a.k.a empirical delay data). In that case, ref_date must correspond to the date of the first time step in incidence_data.

Examples

## Basic usage of nowcast shape_onset_to_report = 2.7 scale_onset_to_report = 1.6 delay_onset_to_report <- list(name="gamma", shape = shape_onset_to_report, scale = scale_onset_to_report) corrected_incidence_data_1 <- nowcast( incidence_data = HK_incidence_data$onset_incidence, delay_until_final_report = delay_onset_to_report ) ## Advanced usage of nowcast # Only taking into account cases that have a chance of being observed greater # than 25%. Here, the delay between symptom onset and report is given as # empirical delay data, hence it is needed to specify the date of the first # entry in incidence_data corrected_incidence_data_2 <- nowcast( incidence_data = HK_incidence_data$onset_incidence, delay_until_final_report = HK_delay_data, ref_date = HK_incidence_data$date[1], cutoff_observation_probability = 0.25 )