Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save samples as we go #2015

Closed
JaimeRZP opened this issue Jun 19, 2023 · 7 comments
Closed

Save samples as we go #2015

JaimeRZP opened this issue Jun 19, 2023 · 7 comments

Comments

@JaimeRZP
Copy link
Member

Dear Tuing team,

I feel like Turing would benefit form adding the possibility of saving the samples on the go. For example in Astronomy, where likelihood evaluations can take of the order of seconds and chains converge in a couple of weeks, people really like saving the samples on the go and periodically checking the convergence.

At the moment I think the user can only save the samples once Turing is done sampling. However, we could offer saving samples as go with by adding a simple callback function that activates if the user provides a "chain_name" as a keyword to sample.

I have drafted how this could be done for HMC samplers in this branch.

@devmotion
Copy link
Member

The AbstractMCMC interface is already designed to support exactly this use case with the interator and transducer support it provides (it's explained in the AbstractMCMC docs: https://turinglang.org/AbstractMCMC.jl/dev/api/#Iterator). I think Turing could just make it a bit more convenient by not deferring all transformations to the end of sampling (see the discussion in #2011).

@torfjelde
Copy link
Member

torfjelde commented Jun 19, 2023

At the moment I think the user can only save the samples once Turing is done sampling. However, we could offer saving samples as go with by adding a simple callback function that activates if the user provides a "chain_name" as a keyword to sample.

Let's start out with just making it a callback, and then we can potentially addd a kwarg for it at a later stage.

This might also just be more suited for TuringCallbacks.jl to begin with. Then if people start using it, we can merge it into Turing.jl proper.

The AbstractMCMC interface is already designed to support exactly this use case with the interator and transducer support it provides

Though this is true (and I've given the same response to this before 😅 ), some users of PPL really just want to call sample and no more. Hence it would be useful to allow them to easily specify this using a callback or something.

I think Turing could just make it a bit more convenient by not deferring all transformations to the end of sampling

When you say "transformations", what do you mean exactly? We currently do invlink after every step (what I was refeerring to in that discussion was in AdvancedHMC, not in Turing). But we have this awful NamedTuple representation with both values and varnames that we convert into something useful in bundle_samples that might be useful. But I think it's also somewhat non-trivial to replace these with something a user can easily deal with (beacuse it involves varnames).

@devmotion
Copy link
Member

When you say "transformations", what do you mean exactly?

That you will end up with something different from calling sample that is less convenient to work with (since you're missing out on bundle_samples).

@torfjelde
Copy link
Member

That you will end up with something different from calling sample that is less convenient to work with (since you're missing out on bundle_samples).

Gotcha 👍

Regarding the callback @JaimeRZP , I'd say just start out simple: implement a callback that has a buffer where it can hold some n transitions, which it then subsequently saves to files periodically. Then you add some functionality for restoring the chain from such files. This way you can a) just save the raw transitions, and b) defer all of the conversion stuff to a simple bundle_samples call once you want to "re-join" the samples. That will be efficient + won't need anything too fancy.

@JaimeRZP
Copy link
Member Author

I think the priority of the callback should be save samples one by one to a format that can be easily exported to other programming languages. The audience that I have in mind probably has plotting pipelines that they would like to re-use.
I believe that for this purposes simply saving the samples as vectors to a CSV file is ideal.

I understand that re-using the whole transitions to MCMCChains methods is tempting but I think it is suboptimal for what I the target user of this function wants.

If the user wants all the nice functions of bundle_samples then I feel like we already offer that with the current code and the users will get that at the end of the sampling.

I am happy to be persuaded though.

@JaimeRZP
Copy link
Member Author

After some discussion with @torfjelde, I have removed the keyword from sample and opened a PR.
We might want to eventually move this to TuringCallBacks alongside other Callbacks that allow the user to restart from a file.

@JaimeRZP
Copy link
Member Author

Moved to TuringLang/TuringCallbacks.jl#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants