Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index.qmd #548

Merged
merged 9 commits into from
Nov 26, 2024
Merged

Update index.qmd #548

merged 9 commits into from
Nov 26, 2024

Conversation

willtebbutt
Copy link
Member

Addresses part of #547 .

@gdalle does this read more correctly?

@willtebbutt willtebbutt requested a review from torfjelde November 4, 2024 11:42
Copy link
Contributor

github-actions bot commented Nov 4, 2024

Preview the changes: https://turinglang.org/docs/pr-previews/548
Please avoid using the search feature and navigation bar in PR previews!

@gdalle
Copy link
Contributor

gdalle commented Nov 4, 2024

I think there might still be a misunderstanding here. The keyword argument AutoReverseDiff(; compile) only controls whether the tape is optimized after being recorded. It doesn't control whether a recorded tape is only used once, or reused across function calls (inducing possible control flow problems).
Tape reuse is instead determined by how the ADGradient is created, and whether or not an example vector x is passed to it. See these key lines in LogDensityProblemsAD.jl:

https://github.com/tpapp/LogDensityProblemsAD.jl/blob/e3401f21b5a065df0d5de38b37fad0e6650618f3/ext/LogDensityProblemsADADTypesExt.jl#L52-L54

https://github.com/tpapp/LogDensityProblemsAD.jl/blob/e3401f21b5a065df0d5de38b37fad0e6650618f3/ext/LogDensityProblemsADReverseDiffExt.jl#L45-L56

So the real question is:

  • in which places does Turing "prepare" the ADGradient with an example x
  • can the user of the library do anything about it, if they know that their loglikelihood has value-dependent control flow?

@willtebbutt
Copy link
Member Author

Ah, I see. Maybe @torfjelde is the right person to ask about this? I've not dug in to exactly where to sort out RD gradient stuff yet in Turing.jl.

@torfjelde
Copy link
Member

in which places does Turing "prepare" the ADGradient with an example x

Typically this occurs in the initial step of the sampler, i.e. once (there are exceptions, but in those cases you can't really do much better in our case).

can the user of the library do anything about it, if they know that their loglikelihood has value-dependent control flow?

Unfortunately not for ReverseDiff.jl in compiled mode (they can however just not use compiled mode).

Copy link
Member

@torfjelde torfjelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments 👍

tutorials/docs-10-using-turing-autodiff/index.qmd Outdated Show resolved Hide resolved
@gdalle
Copy link
Contributor

gdalle commented Nov 6, 2024

Unfortunately not for ReverseDiff.jl in compiled mode (they can however just not use compiled mode).

@torfjelde I think that's the key confusion between us. My understanding is that, regardless of whether it is compiled or not, the tape only captures execution for one version of the control flow. Thus, the tape always becomes invalid if the control flow changes (e.g. different branch of an if statement). Compilation just speeds up the differentiation but does not affect anything else, especially not correctness. Do you have a different understanding?

Source: https://juliadiff.org/ReverseDiff.jl/dev/api/#The-AbstractTape-API

@devmotion
Copy link
Member

The tape is purely internal and neither stored nor reused if compile = Val(false).

Tape reuse is instead determined by how the ADGradient is created, and whether or not an example vector x is passed to it

This is wrong. The tape is only reused if compile = Val(true): https://github.com/tpapp/LogDensityProblemsAD.jl/blob/e3401f21b5a065df0d5de38b37fad0e6650618f3/ext/LogDensityProblemsADReverseDiffExt.jl#L51-L56

@gdalle
Copy link
Contributor

gdalle commented Nov 6, 2024

Okay then IIUC it's a matter of inconsistency between

  • the "compile" kwarg here and in LogDensityProblemsAD (which determines whether a tape is reused)
  • the compilation as defined by ReverseDiff, which is about speeding up an existing tape (and disconnected from the reuse issue)

@gdalle
Copy link
Contributor

gdalle commented Nov 6, 2024

Note that in ADTypes, the "compile" argument to AutoReverseDiff is defined ambiguously too. So we should perhaps add more details to the struct, something like AutoReverseDiff(tape=true, compile=false)?

@gdalle
Copy link
Contributor

gdalle commented Nov 6, 2024

I'd love for you to chime in here @devmotion @torfjelde: SciML/ADTypes.jl#91

@torfjelde
Copy link
Member

Note that in ADTypes, the "compile" argument to AutoReverseDiff is defined ambiguously too. So we should perhaps add more details to the struct, something like AutoReverseDiff(tape=true, compile=false)?

I'm not too opinionated about this, as compilation without caching seems somewhat useless? Are there scenarios where you'd like to do that?

@gdalle
Copy link
Contributor

gdalle commented Nov 6, 2024

I'm not too opinionated about this, as compilation without caching seems somewhat useless? Are there scenarios where you'd like to do that?

Indeed you can't compile a tape you never record in the first place. In any case, I think the ambiguous terminology was fixed by SciML/ADTypes.jl#91. It's just a shame that the word "compile" was chosen instead of "record", given how both are used in ReverseDiff's documentation. But it's a sunk cost now.

@willtebbutt
Copy link
Member Author

Thank you all for raising this problem / helping to fix it -- it's good that we've gotten to the bottom of this! I'm basically happy with this. @gdalle and @torfjelde could take a final look at this, and let me know whether you are happy.

@torfjelde
Copy link
Member

It's just a shame that the word "compile" was chosen instead of "record", given how both are used in ReverseDiff's documentation. But it's a sunk cost now.

I agree with that 😕

Copy link
Member

@torfjelde torfjelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Added one final comment that would be nice to shoehorn in here since we're already making alterations, but looks good:)

tutorials/docs-10-using-turing-autodiff/index.qmd Outdated Show resolved Hide resolved
@willtebbutt
Copy link
Member Author

@penelopeysm it looks like my PR has the potential to cause merge conflicts with #559 . Would you like me to wait until you've merged yours before merging this, or are you happy to resolve any conflicts that arise?

@penelopeysm
Copy link
Member

penelopeysm commented Nov 26, 2024

I'm happy to sort it out :) It's really just the Manifest, so just an issue of letting Pkg do its job.

@willtebbutt willtebbutt merged commit ba5e408 into master Nov 26, 2024
4 checks passed
@willtebbutt willtebbutt deleted the wct/improve-ad-docs branch November 26, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants