`final_size()` should accept a vector of R0 #196

pratikunterwegs · 2024-02-21T15:28:50Z

This issue requests that final_size() should accept a vector of $R_0$ in the argument r0. This stems from this Discussion and parallels similar changes coming to {epidemics}.

Two return type options I can think of:

A single data.frame giving the mean final size estimate per demography group, and the upper and lower 95% CI for each group (similar to {cfr});
A single data.frame giving the final size estimate for each demography group, for each $R_0$ - it is left to the user to summarise the estimates (the option in the code snippet in the Discussion).

I think option 1 is neat and compact, but happy to implement (2) or something else. Thoughts @adamkucharski, @TimTaylor?

The text was updated successfully, but these errors were encountered:

TimTaylor · 2024-02-21T15:51:08Z

I'm not a fan of (1) as it's lossy. For simplicity, I would likely go with (2) but there could also be ...

Like 2 but returned as list (split by demography group).
A nested data.frame that combines (1) and (2).

I'd still lean toward (2) for the simplicity but could be nudged towards (3) and (4) if you thought this was something users would want.

If you went with confidence intervals I'd consider adding upper/lower CI argument to function signature (with defaults).

Tagging @Bisaloo / @chartgerink for whole system overview as would be good to land on a consistent approach (where possible) across whole ecosystem of packages.

pratikunterwegs · 2024-02-21T16:00:41Z

Thanks @TimTaylor - I think (2) is then the best option. (3) would make filtering and summarising (a bit) more tedious, while (4) would add a hard dependency.

I think (2) works well for {finalsize} as the data.frame size is restricted to $N*M$ rows, for $N R_0$ samples and $M$ demography-susceptibility groups (in contrast the {epidemics} output has multiple timepoints as well, making it much longer).

pratikunterwegs · 2024-02-21T16:02:25Z

IMO if we move towards passing a list of susceptibility, p_susceptibility, or contact matrices similar to 'scenarios' in {epidemics}, a nested <data.table> would be the way to go.

chartgerink · 2024-02-22T08:31:12Z

Thanks @TimTaylor for the tag.

My main question is: What need does this solve for whom?

I know there is an implicit need you know of. If we are doing agile development, a clearly articulated underlying user story makes it easier to meaningfully contribute. I have to fill in a lot of gaps now.

In case the user story is more along the lines of "As a researcher, I want to provide multiple values of $R_0$, so that I can generate data in one function run that I can process further" I would opt for option 2.

If the user story is more along the lines of "As a researcher, I want to model average estimates of final size given a set of $R_0$, so that I can use these estimates in policy papers directly" I would opt for option 1.

If the need and benefit is completely different, I dunno what I'd prefer.

PS: "Lossy" here means loss of information, like in compression algorithms @TimTaylor?

TimTaylor · 2024-02-22T08:57:34Z

PS: "Lossy" here means loss of information, like in compression algorithms @TimTaylor?

In effect yep. Going from N outputs to 3 (lower, mean, upper) and then not being able to go the other way.

pratikunterwegs · 2024-02-22T09:25:34Z

My main question is: What need does this solve for whom?

@chartgerink the user requirement is laid out in this Discussion. This is updated in the issue text.

Since the included code snippet also tends towards option (2), we'll provisionally go with that one.

More generally, since our packages are relatively new and have few users (that I know of), we try to anticipate user requirements within dedicated discussion groups, and raise relevant issues.

Bisaloo · 2024-02-22T09:48:23Z

Since the included code snippet also tends towards option (2), we'll provisionally go with that one.

There is no urgency to implement it in a very short time frame. Please let's use a couple of days to let this important design decision simmer. This will allow us to calmly think about all the implications of all solutions and avoid potential implement/revert cycles in the future.

pratikunterwegs · 2024-02-22T11:02:57Z

That's fine by me - am I correct in understanding that this is mostly to do with the return type, or does this also relate to the inputs? If only the return, I can get working on the internal changes for now. No rush either way.

adamkucharski · 2024-02-23T13:49:36Z

Thanks for these suggestions. I agree that we should avoid (1) – in general, I don't think we should provide summary statistics as an output of a modular simulation model – if the user puts a vector of 10 $R_0$ values into a simulation function, I think they should get 10 sets of results out by default. (2) seems OK as option, although think would still be useful to have some cross-package functions that minimise user effort (i.e.. lines of code, format wrangling) required to achieve what they want.

In case useful, some common use cases I'd anticipate for this vector functionality in finalsize (and other packages):

Inputting values from an uncertain distribution, then using the simulation outputs to summarise outcomes with uncertainty (whether CI in table, density plot, or sample of trajectories)
Inputting values representing different scenarios (e.g. 4 values of $R_0$), then giving these values in a table or plot
Inputting a vector of values as part of a grid search if doing some quick fitting (e.g. to serological data).

Some of these are no doubt relevant to other packages too, so would be nice to have consistency for users across packages (e.g. if they've got some pipelines set up for {epidemics}, can just drop the $R_0$ vector into {finalsize} and use same summarisation functions on the output)

pratikunterwegs · 2024-02-23T14:08:42Z

Thanks, just to clarify (for myself mostly), in {finalsize}:

Uncertain distribution parameters: does this apply recursively to the $R_0$ distribution (i.e., uncertainty around the distribution parameters), or is the uncertainty around $R_0$ mostly what users will be dealing with (covered in this issue)?
Scenarios: this issue would allow passing $R_0$ values in a vector, say sampled from 4 distributions, but users would lose track of which values come from which distribution. Passing a list of vectors (100 samples each from 4 $R_0$ distributions) could help group by mean values and make a decent table - this issue can be updated to include that.

adamkucharski · 2024-02-23T16:15:42Z

Scenarios: this issue would allow passing

Personally, for above bullet point, as a user of a vectorised finalsize I'd define some kind of object to store my scenario parameters (maybe a data.frame, with scenario as a column), then pass the R_0 column to finalsize, then attach the output to the storage data.frame in some way – but obviously trickier when dealing with more complex scenarios/outputs, in which case list of vectors may be more sensible, especially if we're standardising this step elsewhere...

pratikunterwegs · 2024-02-23T16:25:00Z

Thanks - there will be some differences with {epidemics} in terms of how vectors of parameters are passed then, as {finalsize} really focuses on $R_0$, but equally could pass lists of susceptibility and p_susceptibility as well, while keep existing functionality to pass a single matrix, $R_0$ etc. Will make a small Gist soon.

pratikunterwegs self-assigned this Feb 21, 2024

pratikunterwegs added New feature New feature or request Discussion Related to a discussion about the package: new and existing features and concepts labels Feb 21, 2024

pratikunterwegs mentioned this issue Feb 22, 2024

Add a continuous benchmarking workflow #197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`final_size()` should accept a vector of R0 #196

`final_size()` should accept a vector of R0 #196

pratikunterwegs commented Feb 21, 2024 •

edited

Loading

TimTaylor commented Feb 21, 2024

pratikunterwegs commented Feb 21, 2024

pratikunterwegs commented Feb 21, 2024

chartgerink commented Feb 22, 2024

TimTaylor commented Feb 22, 2024

pratikunterwegs commented Feb 22, 2024 •

edited

Loading

Bisaloo commented Feb 22, 2024

pratikunterwegs commented Feb 22, 2024 •

edited

Loading

adamkucharski commented Feb 23, 2024

pratikunterwegs commented Feb 23, 2024

adamkucharski commented Feb 23, 2024

pratikunterwegs commented Feb 23, 2024

final_size() should accept a vector of R0 #196

final_size() should accept a vector of R0 #196

Comments

pratikunterwegs commented Feb 21, 2024 • edited Loading

TimTaylor commented Feb 21, 2024

pratikunterwegs commented Feb 21, 2024

pratikunterwegs commented Feb 21, 2024

chartgerink commented Feb 22, 2024

TimTaylor commented Feb 22, 2024

pratikunterwegs commented Feb 22, 2024 • edited Loading

Bisaloo commented Feb 22, 2024

pratikunterwegs commented Feb 22, 2024 • edited Loading

adamkucharski commented Feb 23, 2024

pratikunterwegs commented Feb 23, 2024

adamkucharski commented Feb 23, 2024

pratikunterwegs commented Feb 23, 2024

`final_size()` should accept a vector of R0 #196

`final_size()` should accept a vector of R0 #196

pratikunterwegs commented Feb 21, 2024 •

edited

Loading

pratikunterwegs commented Feb 22, 2024 •

edited

Loading

pratikunterwegs commented Feb 22, 2024 •

edited

Loading