-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stan code suggestions #136
Comments
@bob-carpenter thanks so much again for these suggestions. Just wanted to let you know that some of them have been implemented in devel (see #140). They were incredibly helpful and I cut the runtime by up to 50% by implementing. I will use lupdf eventually but for now my understanding is that rstan (currently 2.21.5 on CRAN) does not support this. |
I would strongly suggest moving to cmdstanr if you don't need direct access to log densities or gradients. It keeps up to date with Stan. CRAN has basically blocked further Stan releases with a combination of their small max package size and lack of dependency management. |
Thanks! I have been using it for my ad hoc work but haven't looked into how it interacts with R packages that contain compiled code. I will do so soon, so will keep this issue open until then. |
Someone asked me to take a look at this repo and I have some suggestions for improving Stan code efficiency.
baggr/inst/stan/functions/prior_increment.stan
I'd suggest rewriting it as follows for efficiency and clarity:
and then call the prior in code as:
theta ~ prior(family, par);
The use of
lupdf
instead oflpdf
will drop the normalizing constants. And the direct return cuts out the very expensive evaluation of conditionals (much more costly than arithmetic on modern hardware). At the very least, your code should useelse if
even if you stick to the assign and single-return style for some reason. It's also more self-documenting in making it clear that it's (a) defining a probability density function, and (b) returning based on family.baggr/inst/stan/logit.stan
Transformed data can be compressed to a one-liner:
and the declaration in parameters can be simplified to
It turns out that
conditional = 1 ? 1 : 0
reduces toconditional
whenconditional
can only take on values 0 or 1 (as with Stan's conditionals). You can also turn transformed parameters into one-liners that make all three cases clear.I also moved the arithmetic inside so that it's done once before replicating rather than
K_pooled
times afterward.Then in the model class,
You want to replace two sequential exclusive
if
s with anelse if
on the second one to avoid extra conditional eval and to make it clear it's a choice.You should vectorize
prior_increment_real
. This is costly to do in a loop versus rolling the vectorization into the function (i.e., takingeta
as a param).I'd be strongly inclined to try to refactor so that the choice of poolng type and Nc was rolled into the prior. It's hard to follow as written with all the conditionals. Otherwise, I'd urge you to have three top[-level conditionals, even if it means some code duplication.
With generated quantities, I'd urge you to break out the definitions of the three variables and rearrange the loops to make it clear what order things are defined in and under what conditions they're defined. For example logpd isn't defined if
pooling_type == 0
. Is that the same condition asK_test > 0
?This makes it clear locally, for example, that the definition of
theta_k_test
is sound in that it is size 0 or filled in to sizeK_test
. Also, what happens ifpooling_type == 0
andK_test > 0
? It looks likelogpd
will be undefined. I turned everything into a vector so that we could use vector arithmetic in the definitions. I also vectorized the bernoulli_logit calls.The other programs could use similar changes, so I'm only going to repeat novel recommendations.
baggr/inst/stan/mutau.stan
I'd reduce the definition of
theta_k
to a one-liner, moving in the arithmetic inside therep_matrix
But I'm unclear on why if
cumsum == 1
, you don't use cumulative sums (lines 58--62 of the original code).This is the line that could hose performance:
This is causing a solve of
prior_hypermean_scale
each iteration. What you want to do is (a) use a Cholesky-based representation, and (b) accumulate all those values from1:K
into a single array to feed once intomulti_normal_cholesky
. That way, it's one solve of orderO(P^2)
vs.K
solves of orderO(P^3)
. The way to do this is to Cholesky factor in the transformed data, reshape the variableeta
, and then use a single vectorized call.This'll be a huge savings for even modest size
K
.In the model, you want to turn all those sampling statements into vectorized ones. So it'd look like this:
It seems like this'd be more costly to create the vectors, but it allows the autodiff tree to be compressed. And you really want to move that
to_vector(theta_hat_k)
into the transformed data block.All the same comments elsewhere apply s in the last model. There's lots that can be tightened up here if you care about speed. And the other models are just more of the same.
The text was updated successfully, but these errors were encountered: