-
Essentially I want to compare various .gmt files which are gene annotations and every pathway is sort of like a cluster with a defined set of gene in it. The .gmt format is one step away from --abc. The problem is that many annotation contain super redundant pathways and can enrich the strangest of things at times. I am curious to try the three metrics you developed in clm: efficiency, mass fraction and area fraction to assess the quality of arious .gmts as well as compare them to each other. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
These metrics are really only applicable to clusterings (a partitioning of a dataset into disjoint sets) rather than at the level of a cluster considered by itself. Those annotations are intrinsically overlapping and have different granularity depending ont the scale of pathway considered. The metrics you mention are very basic, and I don't think the notion of say, giving more significance to the highest value is a good idea. Efficiency tends to reward smaller more granular clusterings, modularity for example does the opposite (see #20). |
Beta Was this translation helpful? Give feedback.
These metrics are really only applicable to clusterings (a partitioning of a dataset into disjoint sets) rather than at the level of a cluster considered by itself. Those annotations are intrinsically overlapping and have different granularity depending ont the scale of pathway considered. The metrics you mention are very basic, and I don't think the notion of say, giving more significance to the highest value is a good idea. Efficiency tends to reward smaller more granular clusterings, modularity for example does the opposite (see #20).