Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Metric Calculation Methods #131

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Gautam8387
Copy link
Contributor

@Gautam8387 Gautam8387 commented Oct 29, 2024

Scarf Metrics

This PR introduces initial metric calculation functionalities for assessing cell population structures and integration quality in DataStore. Three new methods have been added to the DataStore class:

  • metric_lisi: Computes the Local Inverse Simpson Index (LISI) scores to evaluate the mixing of cell populations across neighborhoods.
  • metric_silhouette: Calculates modified silhouette scores to assess cluster separation.
  • metric_integration: Measures alignment quality between batches using ARI or NMI.

The file metrics.py was added to provide implementations for these metric functions.

1. metric_lisi()

This function calculates LISI scores for cell populations.
Key Parameters: label_colnames, use_latest_knn, from_assay.
Return: Optionally returns LISI scores by label if return_lisi=True

2. metric_silhouette():

Computes silhouette scores, adapted for use with KNN graphs.
Key Parameters: use_latest_knn, res_label.
Return: Returns an array of silhouette scores for each cluster. NaN values indicate clusters that could not be scored.

3. metric_integration():

Calculates integration scores between batches using ARI or NMI.
Key Parameters: batch_labels, metric (default: ari).
Return: Returns a float between 0 and 1, indicating alignment quality.

Examples:

# Assuming `datastore.make_graph()` was run.
# Example usage of metric_lisi with default KNN
lisi_scores = datastore.metric_lisi(
    label_colnames=["cell_type"],
    save_result = False,
    return_lisi=True
)

# Assuming `datastore.run_leiden_clustering()` was run
# Example usage of metric_silhouette for cluster evaluation
silhouette_scores = datastore.metric_silhouette(
    use_latest_knn=True,
    res_label="leiden_cluster"
)

# Example usage of metric_integration to assess batch alignment
integration_score = datastore.metric_integration(
    batch_labels=["batch1", "batch2"],
    metric="ari"
)

Working notebooks here

…tion (adjusted rand score, normalized mutual information score). Uses the latest knn location when calculating default. Provide all parameter otherwise.

metrics.py: function for computing all scores.

graph_datastore.py: rename functions
	- Added DocString and typing
datastore.py:
	- lisi: filtered metadata as per 'I'
	- doc strings and typing
metrics.py:
	- formatted & typing
tests:
	- Added test for metrics
@Gautam8387 Gautam8387 marked this pull request as draft November 6, 2024 21:56
@Gautam8387 Gautam8387 marked this pull request as ready for review November 6, 2024 23:10
@Gautam8387 Gautam8387 marked this pull request as draft December 23, 2024 10:39
…gration to use latest KNN location and option to provide KNN location as input; assay.py and metrics.py: ruff formatting
@Gautam8387 Gautam8387 marked this pull request as ready for review January 13, 2025 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant