-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correspondence analysis #203
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Now we need to shape it into an acceptable API style.
src/MultivariateStats.jl
Outdated
using LinearAlgebra | ||
using SparseArrays | ||
using Statistics: middle | ||
using StatsAPI: RegressionModel | ||
using StatsBase: | ||
SimpleCovariance, | ||
CovarianceEstimator, | ||
AbstractDataTransform, | ||
ConvergenceException, | ||
pairwise, | ||
pairwise!, | ||
CoefTable | ||
|
||
import Statistics: mean, var, cov, covm, cor | ||
import Base: length, size, show | ||
import StatsAPI: fit, predict, coef, weights, dof, r2 | ||
import LinearAlgebra: eigvals, eigvecs | ||
|
||
export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it automatic styling? Why the format change?
src/MultivariateStats.jl
Outdated
@@ -36,27 +42,23 @@ module MultivariateStats | |||
|
|||
# whiten | |||
Whitening, # Type: Whitening transformation | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new lines helped to separate (visually) various algorithms & their methods.
src/mca.jl
Outdated
""" | ||
ca | ||
|
||
Fit a correspondence analysis using the data array `X` whose rows are | ||
the objects and columns are the variables. The first `d` components | ||
are retained. | ||
""" | ||
function ca(X, d) | ||
c = fit(CA, X, d) | ||
return c | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for this high level function, standard API presume fit
call for construction a model.
src/mca.jl
Outdated
ca.G = Wc \ Q * Dq | ||
|
||
# Get the eigenvalues | ||
ca.I = D .^ 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fit
call should always return model
src/mca.jl
Outdated
function CA(X) | ||
|
||
# Convert to proportions | ||
X = X ./ sum(X) | ||
|
||
# Calculate row and column means | ||
r = sum(X, dims = 2)[:] | ||
c = sum(X, dims = 1)[:] | ||
|
||
# Center the data matrix to create residuals | ||
R = X - r * c' | ||
|
||
# Standardize the data matrix to create standardized residuals | ||
Wr = Diagonal(sqrt.(r)) | ||
Wc = Diagonal(sqrt.(c)) | ||
SR = Wr \ R / Wc | ||
|
||
T = eltype(X) | ||
return CA(X, R, r, c, SR, zeros(T, 0, 0), zeros(T, 0, 0), zeros(T, 0)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm code should be in the fit
function. There is no need for special constructor of the model.
src/mca.jl
Outdated
|
||
# Recoding dictionary, maps each distinct value in z to | ||
# an offset | ||
rd = Dict{eltype(z),Int}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use function parametrized types for arguments.
src/mca.jl
Outdated
end | ||
|
||
# Reverse the recoding dictionary | ||
rdi = Dict{Int,eltype(z)}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
src/mca.jl
Outdated
# to levels for each variable. | ||
function make_indicators(Z) | ||
|
||
rd, rdr = Dict[], Dict[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to specify types for these dicts.
test/mca.jl
Outdated
:V3 => ["D", "D", "D", "C", "D", "C", "D", "C"], | ||
) | ||
|
||
m = mca(da, 3; vnames = names(da)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use standard API for construction of the model - fit
call.
function inertia(mca::MCA) | ||
inr = ( | ||
Raw = mca.C.I, | ||
Unadjusted = mca.unadjusted_eigs, | ||
Benzecri = mca.benzecri_eigs, | ||
Greenacre = mca.greenacre_eigs, | ||
) | ||
return inr | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to define multiple getter function for various inertia types.
Thanks, I have resolved most of these. The CI fails are due to importation of DataFrames. If you don't want a DataFrames dependency we can probably work around it, otherwise I can update the project file to include this dependency. |
pick f57ef58 check for repeated non-zero signular values
This is an implementation of Correspondence Analysis and Multiple Correspondence Analysis. I think it would be useful for this to be included in the MultivariateStats package.