Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding bayesian numerical tests #69

Merged

Conversation

ShouvikGhosh2048
Copy link
Collaborator

No description provided.

@sourish-cmi
Copy link
Collaborator

@ShouvikGhosh2048 tests are failing. What exactly are you testing? It is not clear to me?

@ShouvikGhosh2048
Copy link
Collaborator Author

The tests lie in these sections.

We check whether the model summaries, quantiles and predictions match.

@sourish-cmi
Copy link
Collaborator

@ShouvikGhosh2048
Yes, that makes sense. In model summaries, what columns are you checking?

@ShouvikGhosh2048
Copy link
Collaborator Author

The tests check all summary columns except ess_per_sec.

@sourish-cmi
Copy link
Collaborator

The tests check all summary columns except ess_per_sec.

@ShouvikGhosh2048 yes - that is the correct way. I think - since predictions are all functions of model summaries and quantiles. Just checking average of all predictions would be enough in this case

@sourish-cmi
Copy link
Collaborator

You can test the following:

CRRao.set_rng(StableRNG(123))
container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression(), Prior_Ridge())
using Statistics
julia> mean(predict(container,df))
20.057250989674284

@sourish-cmi sourish-cmi added this to the Package Completeness milestone Dec 22, 2022
@sourish-cmi sourish-cmi linked an issue Dec 22, 2022 that may be closed by this pull request
@sourish-cmi
Copy link
Collaborator

To test logistic regression:

julia> using CRRao, RDatasets, StableRNGs, StatsModels
julia> turnout = dataset("Zelig", "turnout")
julia> CRRao.set_rng(StableRNG(123))
julia> container_logit = fit(@formula(Vote ~ Age + Race + Income + Educate), turnout, LogisticRegression(), Logit(), Prior_Ridge())
julia> using Statistics
julia> mean(predict(container_logit,turnout))
0.7469549400854435

julia> predict(container_logit,turnout)

@sourish-cmi
Copy link
Collaborator

Looks like one test in logistic regression and one test in NegativeBinomial are failing. Can't identify which ones. Can you please help me to identify it?

@ShouvikGhosh2048
Copy link
Collaborator Author

In the latest run, the failed tests are:

  • Logistic Prior_TDist Cauchit
  • NegativeBinomial Prior_Laplace

However the number of tests passing changes on running the tests again:
https://github.com/xKDR/CRRao.jl/actions/runs/3756875223/jobs/6383449582
https://github.com/xKDR/CRRao.jl/actions/runs/3756875223/jobs/6383648303

@sourish-cmi
Copy link
Collaborator

In the latest run, the failed tests are:

  • Logistic Prior_TDist Cauchit
  • NegativeBinomial Prior_Laplace

However the number of tests passing changes on running the tests again: https://github.com/xKDR/CRRao.jl/actions/runs/3756875223/jobs/6383449582 https://github.com/xKDR/CRRao.jl/actions/runs/3756875223/jobs/6383648303

Okay let's wait then. If the repeat run passes the test - then we are good.

@sourish-cmi
Copy link
Collaborator

Test failed at :

Logistic Regression: Test Failed at /home/runner/work/CRRao.jl/CRRao.jl/test/numerical/bayesian/LogisticRegression.jl:56
  Expression: mean(predict(model, turnout))  test_mean
   Evaluated: 0.77122135014641  0.7715325526207547

and

Negative Binomial Regression: Test Failed at /home/runner/work/CRRao.jl/CRRao.jl/test/numerical/bayesian/NegBinomialRegression.jl:15
  Expression: mean(predict(model, sanction))  test_mean
   Evaluated: 6.847395006812232  6.868506051646364

What do you make of it?

@sourish-cmi
Copy link
Collaborator

@ShouvikGhosh2048 looks like it is failing randomly. I have no clue what is failing and why? Let's meet and discuss this. One possibility we should make sure - that we are seeding correctly.

@ShouvikGhosh2048
Copy link
Collaborator Author

using Turing, StableRNGs

@model function example(x)
    m ~ Normal(0, 1)
    x ~ Normal(m, sqrt(1))
end

res = sample(StableRNG(123), example(1), NUTS(), 10)

This program gives different summaries on different runs. (The initial epsilon chosen switches between 1.6 and 3.2.)

@sourish-cmi
Copy link
Collaborator

I have asked for help in Julia discourse here. I hope somebody will share the correct way to solve the issue.

@sourish-cmi
Copy link
Collaborator

sourish-cmi commented Dec 23, 2022

@ShouvikGhosh2048 @ajaynshah @ayushpatnaikgit @mousum-github

I have submitted an issue in Turing.jl - here - We see the results are not reproducible. Hence our tests are not passing.

@ajaynshah
Copy link
Member

ajaynshah commented Dec 23, 2022 via email

@sourish-cmi
Copy link
Collaborator

How should we solve this. There are innumerable simulation codes in the world, how are they being tested. As I see it, there are exactly two pathways: a. set seed and then everything is deterministic. This requires a cross platform predictable RNG. I'm sure the Julia people have figured this out. b. Go to a very high N like 1M draws and test only the first 3 digits. But there will always be 1 in 1e5 tests that will go wrong so this is an unsatisfying solution.

On Fri, 23 Dec 2022 at 10:19, Sourish @.> wrote: @ShouvikGhosh2048 https://github.com/ShouvikGhosh2048 @ajaynshah https://github.com/ajaynshah @ayushpatnaikgit https://github.com/ayushpatnaikgit @mousum-github https://github.com/mousum-github I have submitted an issue in Turing.jl We see the results are not reproducible. Hence our tests are not passing. — Reply to this email directly, view it on GitHub <#69 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2TJC6ZR54ZDG76O6LOOTWOUVNPANCNFSM6AAAAAATFYA6ZA . You are receiving this because you were mentioned.Message ID: <xKDR/CRRao. @.>
-- Ajay Shah @.*** http://www.mayin.org/ajayshah

Solution (a) that is setting seed and then everything is deterministic is the best solution. In anyway, going forward we should figure out how Julia people are solving it?

I do not like the part (b) - as you mentioned that there is always 1 in 1e5 tests that will fail - also it extremely time consuming.

I am thinking a third possibility - run the code twice - we will have two sets of samples - then run a Kolmogorv-Smironov tests to check if both sets are coming from the same population distribution. Theoretically, it should come from same probability distribution - if the test fails then we have much larger problem - if it pass - I am happy - still results are not reproducible. It means we cannot submit paper with Julia results.

@sourish-cmi sourish-cmi added enhancement New feature or request help wanted Extra attention is needed labels Dec 23, 2022
@ajaynshah
Copy link
Member

ajaynshah commented Dec 23, 2022 via email

@sourish-cmi
Copy link
Collaborator

sourish-cmi commented Dec 23, 2022

@ShouvikGhosh2048 @ajaynshah @mousum-github

Here is one solution from Julia-Discourse

Following produces exact same results:


res1 = sample(StableRNG(123), example(1), NUTS(1000, 0.65; init_ϵ = 1.6), 1000)
res2 = sample(StableRNG(123), example(1), NUTS(1000, 0.65; init_ϵ = 1.6), 1000)
res3 = sample(StableRNG(123), example(1), NUTS(1000, 0.65; init_ϵ = 1.6), 1000)

Apparently there is a bug. For now the patch in above should work.

@sourish-cmi
Copy link
Collaborator

Looks like this issue and PR have solved the bug last night

@ShouvikGhosh2048
Copy link
Collaborator Author

@sourish-cmi
The Turing bug has been fixed. Using the new version, the tests are now passing.

@sourish-cmi
Copy link
Collaborator

@sourish-cmi The Turing bug has been fixed. Using the new version, the tests are now passing.

@ShouvikGhosh2048 @ajaynshah @mousum-github

This is great news.

Copy link
Collaborator

@sourish-cmi sourish-cmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests have passed.

@sourish-cmi sourish-cmi merged commit 39ff832 into xKDR:main Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Testing strategy, some thoughts
3 participants