Save samples as we go #2015

JaimeRZP · 2023-06-19T16:27:46Z

Dear Tuing team,

I feel like Turing would benefit form adding the possibility of saving the samples on the go. For example in Astronomy, where likelihood evaluations can take of the order of seconds and chains converge in a couple of weeks, people really like saving the samples on the go and periodically checking the convergence.

At the moment I think the user can only save the samples once Turing is done sampling. However, we could offer saving samples as go with by adding a simple callback function that activates if the user provides a "chain_name" as a keyword to sample.

I have drafted how this could be done for HMC samplers in this branch.

The text was updated successfully, but these errors were encountered:

devmotion · 2023-06-19T16:52:04Z

The AbstractMCMC interface is already designed to support exactly this use case with the interator and transducer support it provides (it's explained in the AbstractMCMC docs: https://turinglang.org/AbstractMCMC.jl/dev/api/#Iterator). I think Turing could just make it a bit more convenient by not deferring all transformations to the end of sampling (see the discussion in #2011).

torfjelde · 2023-06-19T16:57:03Z

At the moment I think the user can only save the samples once Turing is done sampling. However, we could offer saving samples as go with by adding a simple callback function that activates if the user provides a "chain_name" as a keyword to sample.

Let's start out with just making it a callback, and then we can potentially addd a kwarg for it at a later stage.

This might also just be more suited for TuringCallbacks.jl to begin with. Then if people start using it, we can merge it into Turing.jl proper.

The AbstractMCMC interface is already designed to support exactly this use case with the interator and transducer support it provides

Though this is true (and I've given the same response to this before 😅 ), some users of PPL really just want to call sample and no more. Hence it would be useful to allow them to easily specify this using a callback or something.

I think Turing could just make it a bit more convenient by not deferring all transformations to the end of sampling

When you say "transformations", what do you mean exactly? We currently do invlink after every step (what I was refeerring to in that discussion was in AdvancedHMC, not in Turing). But we have this awful NamedTuple representation with both values and varnames that we convert into something useful in bundle_samples that might be useful. But I think it's also somewhat non-trivial to replace these with something a user can easily deal with (beacuse it involves varnames).

devmotion · 2023-06-19T17:09:34Z

When you say "transformations", what do you mean exactly?

That you will end up with something different from calling sample that is less convenient to work with (since you're missing out on bundle_samples).

torfjelde · 2023-06-19T17:17:28Z

That you will end up with something different from calling sample that is less convenient to work with (since you're missing out on bundle_samples).

Gotcha 👍

Regarding the callback @JaimeRZP , I'd say just start out simple: implement a callback that has a buffer where it can hold some n transitions, which it then subsequently saves to files periodically. Then you add some functionality for restoring the chain from such files. This way you can a) just save the raw transitions, and b) defer all of the conversion stuff to a simple bundle_samples call once you want to "re-join" the samples. That will be efficient + won't need anything too fancy.

JaimeRZP · 2023-06-20T09:58:18Z

I think the priority of the callback should be save samples one by one to a format that can be easily exported to other programming languages. The audience that I have in mind probably has plotting pipelines that they would like to re-use.
I believe that for this purposes simply saving the samples as vectors to a CSV file is ideal.

I understand that re-using the whole transitions to MCMCChains methods is tempting but I think it is suboptimal for what I the target user of this function wants.

If the user wants all the nice functions of bundle_samples then I feel like we already offer that with the current code and the users will get that at the end of the sampling.

I am happy to be persuaded though.

JaimeRZP · 2023-06-20T13:16:45Z

After some discussion with @torfjelde, I have removed the keyword from sample and opened a PR.
We might want to eventually move this to TuringCallBacks alongside other Callbacks that allow the user to restart from a file.

JaimeRZP · 2023-07-27T16:13:52Z

Moved to TuringLang/TuringCallbacks.jl#44

JaimeRZP closed this as completed Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save samples as we go #2015

Save samples as we go #2015

JaimeRZP commented Jun 19, 2023

devmotion commented Jun 19, 2023

torfjelde commented Jun 19, 2023 •

edited

Loading

devmotion commented Jun 19, 2023

torfjelde commented Jun 19, 2023

JaimeRZP commented Jun 20, 2023

JaimeRZP commented Jun 20, 2023

JaimeRZP commented Jul 27, 2023

Save samples as we go #2015

Save samples as we go #2015

Comments

JaimeRZP commented Jun 19, 2023

devmotion commented Jun 19, 2023

torfjelde commented Jun 19, 2023 • edited Loading

devmotion commented Jun 19, 2023

torfjelde commented Jun 19, 2023

JaimeRZP commented Jun 20, 2023

JaimeRZP commented Jun 20, 2023

JaimeRZP commented Jul 27, 2023

torfjelde commented Jun 19, 2023 •

edited

Loading