Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correspondence analysis #203

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Correspondence analysis #203

wants to merge 11 commits into from

Conversation

kshedden
Copy link
Contributor

@kshedden kshedden commented Sep 6, 2022

This is an implementation of Correspondence Analysis and Multiple Correspondence Analysis. I think it would be useful for this to be included in the MultivariateStats package.

Copy link
Collaborator

@wildart wildart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Now we need to shape it into an acceptable API style.

Comment on lines 3 to 21
using LinearAlgebra
using SparseArrays
using Statistics: middle
using StatsAPI: RegressionModel
using StatsBase:
SimpleCovariance,
CovarianceEstimator,
AbstractDataTransform,
ConvergenceException,
pairwise,
pairwise!,
CoefTable

import Statistics: mean, var, cov, covm, cor
import Base: length, size, show
import StatsAPI: fit, predict, coef, weights, dof, r2
import LinearAlgebra: eigvals, eigvecs

export
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it automatic styling? Why the format change?

@@ -36,27 +42,23 @@ module MultivariateStats

# whiten
Whitening, # Type: Whitening transformation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new lines helped to separate (visually) various algorithms & their methods.

src/mca.jl Outdated
Comment on lines 94 to 311
"""
ca

Fit a correspondence analysis using the data array `X` whose rows are
the objects and columns are the variables. The first `d` components
are retained.
"""
function ca(X, d)
c = fit(CA, X, d)
return c
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this high level function, standard API presume fit call for construction a model.

src/mca.jl Outdated
ca.G = Wc \ Q * Dq

# Get the eigenvalues
ca.I = D .^ 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fit call should always return model

src/mca.jl Outdated
Comment on lines 46 to 65
function CA(X)

# Convert to proportions
X = X ./ sum(X)

# Calculate row and column means
r = sum(X, dims = 2)[:]
c = sum(X, dims = 1)[:]

# Center the data matrix to create residuals
R = X - r * c'

# Standardize the data matrix to create standardized residuals
Wr = Diagonal(sqrt.(r))
Wc = Diagonal(sqrt.(c))
SR = Wr \ R / Wc

T = eltype(X)
return CA(X, R, r, c, SR, zeros(T, 0, 0), zeros(T, 0, 0), zeros(T, 0))
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm code should be in the fit function. There is no need for special constructor of the model.

src/mca.jl Outdated

# Recoding dictionary, maps each distinct value in z to
# an offset
rd = Dict{eltype(z),Int}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use function parametrized types for arguments.

src/mca.jl Outdated
end

# Reverse the recoding dictionary
rdi = Dict{Int,eltype(z)}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

src/mca.jl Outdated
# to levels for each variable.
function make_indicators(Z)

rd, rdr = Dict[], Dict[]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify types for these dicts.

test/mca.jl Outdated
:V3 => ["D", "D", "D", "C", "D", "C", "D", "C"],
)

m = mca(da, 3; vnames = names(da))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use standard API for construction of the model - fit call.

Comment on lines +302 to +584
function inertia(mca::MCA)
inr = (
Raw = mca.C.I,
Unadjusted = mca.unadjusted_eigs,
Benzecri = mca.benzecri_eigs,
Greenacre = mca.greenacre_eigs,
)
return inr
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to define multiple getter function for various inertia types.

@kshedden
Copy link
Contributor Author

Thanks, I have resolved most of these. The CI fails are due to importation of DataFrames. If you don't want a DataFrames dependency we can probably work around it, otherwise I can update the project file to include this dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants