Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements / alternatives to clustering #44

Open
elliottash opened this issue Feb 1, 2022 · 1 comment
Open

improvements / alternatives to clustering #44

elliottash opened this issue Feb 1, 2022 · 1 comment

Comments

@elliottash
Copy link
Collaborator

elliottash commented Feb 1, 2022

-- could also allow for custom initialized cluster centroids
-- allow for clustering based on cosine-similarity thresholds, to the centroid, or to the closest member of the cluster.
-- replace the arora et al embeddings with S-BERT embeddings
-- allow for stretching the space along an antonyms dimension
-- drop all names as stopwords
-- drop patients that contain a verb
-- make clustering on the list of entity phrases, rather than the set, an option. that is, add sample_weight=n_mentions to the k-means .fit() function. could also weight by log of n_mentions.

@elliottash
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant