You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to consolidate all of the new ideas for feature extraction here, separate from the supplementary plots issues, so it's easier to keep track of what's for the paper and what is next.
Identifying which features to keep
The current pipeline uses some relatively simple heuristics to filter out features prior to analysis. For example, if a given compartment-specific feature is highly correlated with the image-wide feature, then that feature isn't included in the final dataset. However, we don't compare the compartment-specific features to each other, and we don't compare features of different types to each other, as a way of identifying additional potentially correlated features. Would it make sense to add in additional checks to identify potentially duplicative features?
Alternatively, should we think about different or complimentary approaches to identify which features to prune? For example, what about a feature that isn't correlated, but is just random noise? Are there ways we could look at the data and infer that a given feature does not contain any useful information? Could we use a metric other than correlation to determine if a feature should be kept? For example, statistical testing to determine if means are different across compartments?
Should we revisit the thresholds for number of non-zero/non-missing values for a feature to be included? Should this threshold be adaptive, based on what tier of feature we're looking at?
Additional features
Functional marker expression per compartment. Given the interesting differences for cell type abundance by compartment, can we do the same thing for functinoal markers? Once again, doing it across all functional markers would likely be too much feature polution, but maybe there's a more targeted way to introduce this? Just for a subset of functional marker/cell type combinations perhaps? Just the ones associated with survival? Maybe a more stringent criteria for determining which features are uncorrelated with global level, or require more images to be positive?
The cell ratio features have been quite informative, can we take a similar approach to functional marker features to look at ratios? For example, at the cluster lineage resolution, for a given functional marker, look at the ratios between all the cell types that are positive? This would likely be too many features. Maybe we do it for a subset? Or a manually curated list? Would it make more sense to look at ratios of proportions, i.e. 30% CD4T, 40%CD8T, 0.75 ratio? Or 200 CD4T positive, 100 CD8T positive, ratio of 2? Can we do this per compartment?
Are there other ratio-based features that it would make sense to include? Which metrics would lend themselves to being calculated in this way without a lot of manual guidance?
Is there a more principled way we can decide which features to compute per compartment, and which features not?
Are we excluding the double positive functional marker combinations in the right way?
The text was updated successfully, but these errors were encountered:
I wanted to consolidate all of the new ideas for feature extraction here, separate from the supplementary plots issues, so it's easier to keep track of what's for the paper and what is next.
The text was updated successfully, but these errors were encountered: