Adding distributions and log scores for K-Normal-Mixture #265
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have coded the log score and the derivatives based on the attached derivations.
Implementation_of_Mixture_Normal_Density_in_NGBoost.pdf
To map the mixture proportions I have used multivariate logit transformation. The inverse of the Jacobian of this transformation is required to find 'd_score'. This can be calculated in a closed-form in the following way,
Inv_jaccobian.pdf
The exact Fisher information matrix can be calculated but the expressions of double derivatives will be ugly. I shall give it a try later.
For initial values, K-means clustering has been used where sample proportions, means, and variances from each cluster are considered as mixture proportions, mean, and variance of each normal distribution.