Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exampling showing how to use ParetoSmooth #85

Closed
itsdfish opened this issue Aug 1, 2023 · 2 comments · Fixed by #86
Closed

Add exampling showing how to use ParetoSmooth #85

itsdfish opened this issue Aug 1, 2023 · 2 comments · Fixed by #86

Comments

@itsdfish
Copy link
Contributor

itsdfish commented Aug 1, 2023

Hi,

A few days ago on Discourse, someone described a problem with ParetoSmooth in which the wrong (or at least unexpected) sample size was used. It turns out this is related to an old issue already reported. For example, ParetoSmooth produces different results given two formally equivalent Turing models:

Model 1

using Turing
using ParetoSmooth
using Distributions

@model function model(data)
    μ ~ Normal()
    data ~ Normal(μ, 1)
end

data = rand(Normal(0, 1), 100)

chain = sample(model(data), NUTS(), 1000)
rez1 = psis_loo(model(data), chain)

Model 2

using Turing
using ParetoSmooth
using Distributions

@model function model(data)
    μ ~ Normal()
    for i in 1:length(data)
        data[i] ~ Normal(μ, 1)
    end
end

data = rand(Normal(0, 1), 100)

chain = sample(model(data), NUTS(), 1000)
rez1 = psis_loo(model(data), chain)

The only thing that differs is the use of a for loop. Unfortunately, this distinction was not clear and we wasted time trying to figure out the problem. Until a solution is added, I think it would be good to add an example explaining how to make a model work properly with ParetoSmooth. I would be willing to make a PR. Is it the case that for loop or .~ is the recommended approach?

@ParadaCarleton
Copy link
Member

Yes, I think a PR would be great. As I mentioned in that issue, the ideal situation is to change the interface for ParetoSmooth.jl (or potentially ArviZ.jl, which frankly is probably the better package for LOO-CV this at this point) to take as input both a model and a dataset, where the dataset should be given as an iterator over IID observations. Then, the algorithm proceeds by removing each data point from the dataset one at a time and calculating the appropriate LOO distribution.

@itsdfish
Copy link
Contributor Author

itsdfish commented Aug 1, 2023

Sounds good. I will submit a PR to update the documentation later this week or during the weekend.

Do you have plans for changing the interface?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants