Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate K-Means Clustering with CRRao #109

Open
sourish-cmi opened this issue Mar 7, 2023 · 3 comments
Open

Integrate K-Means Clustering with CRRao #109

sourish-cmi opened this issue Mar 7, 2023 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@sourish-cmi
Copy link
Collaborator

Integrate K-means clustering with CRRao
Integrate K-means clustering with CRRao from Clustering.jl package.

The Clustering.jl package is weird because it wants data to be supplied as d x n, where d is the dimension of the data, i.e., number of variables, and n is the number of samples. However, this is the opposite practice of the Stat community. In the Statistics community, it must be supplied as n x d. So we need to fix it.

The possible solution would look like

container = fit(DataFrame, KMeansClustering(),K::Int64,...)

If somebody does not want to use all variables in the DataFrame, then the solution would look like

container = fit(VarName, DataFrame, KMeansClustering(),K::Int64,...)

Warning: The dimension of data input in Clustering.jl is n x d

@sourish-cmi sourish-cmi added enhancement New feature or request good first issue Good for newcomers labels Mar 7, 2023
@a-keshav
Copy link

a-keshav commented Jan 18, 2024

Hey, is this issue still open? and if yes, could you please assign it to me?

@sourish-cmi
Copy link
Collaborator Author

Hey, is this issue still open? and if yes, could you please assign it to me?

Sure why not - you can try it.

@a-keshav
Copy link

a-keshav commented Feb 9, 2024

I have submitted a PR for this issue. In this implementation, the function returns a 'KmeansResult' object. One hurdle I see with the current implementation is that the attributes of the object returned are also of the form (d x n). Do you believe that instead of passing the object, the clustering results would be better passed as tuples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants