explain major and minor cn for pyclone input #40

endertial · 2024-07-02T09:46:51Z

Hi, I'm new to pyclone and I would like to use it to inferring AML clonal architecture from tumor-only targeted DNA sequencing samples.

I'm not sure how to get information about major and minor cn required to run pyclone. I have tried the following workflow on one sample to setting the pipeline:
First, I call SNV and use cnvkit to perform copy number analysis and there is no alteration in CN profile in the SNV regions previously identified.
So I put major_cn= 1 and minor_cn = 1, even if is difficult to infer allele-specific CN without matched normal sample. Infact, in some paper I saw major=2 and minor=0, what's the difference?

I try to run both with different results:
-Major_cn=2; minor_cn=0
mutation_id sample_id cluster_id cellular_prevalence cellular_prevalence_std variant_allele_frequency
chr19:13054571 pyclone.M2m0.input 1 0.7557546548022801 0.23054355560178952 0.4852941176470588
chr1:43815008 pyclone.M2m0.input 1 0.5475871189192091 0.160503139600136 0.3264183561213264
chr20:31022288 pyclone.M2m0.input 0 0.06789026894828525 0.027398939127064446 0.03896961690885073

-Major_cn=1; minor_cn=1
mutation_id sample_id cluster_id cellular_prevalence cellular_prevalence_std variant_allele_frequency
chr19:13054571 pyclone.input 0 0.9646000732189531 0.023390848092490125 0.4852941176470588
chr1:43815008 pyclone.input 0 0.6535034427688614 0.028133163070667364 0.3264183561213264
chr20:31022288 pyclone.input 0 0.0770851350192494 0.011144088857009335 0.03896961690885073

Someone can explain me the difference and how I can calculated major and minor cn with this type of data ?

Thanks,

Best regards

ajw2329 · 2024-10-24T18:06:03Z

Thanks to the authors for the great tool! I would also love some clarification on correct inputs in regions with no detected somatic CNV. Should these perhaps be filtered and a user can manually supply CCF = VAF*normal_cn/tumor_purity?

I am not an author, but in case it's useful my 2 cents on the above issue: I think major_cn = 1, minor cn = 1 is the correct approach for autosomes when there is no detected CNV.

I think when you give minor_cn = 0 for autosomes you are implying loss of heterozygosity has happened in this region. This means with 100% tumor purity you would expect a VAF of 1. You didn't specify your tumor purity here, but suspect I it's very high. I'm guessing pyclone is struggling to reconcile VAF ~ 0.5 (for the first mutation) with the high tumor purity in this scenario, thus the high CCF uncertainty (cellular_prevalence_std).

In contrast, when you give minor_cn = 1, you get a much more confident (and, I think, correct) answer (much lower stdev), because pyclone is no longer confused by the low VAFs combined with the high tumor purity. For tumor purity ~ 100% you get cellular prevalence ~ VAF*2, which you are seeing in all of your results.

endertial · 2024-10-25T10:54:34Z

Hi ajw2329,
Thanks for your reply,

From a biological perspective, I fully agree with your point. If a somatic variant occurs in a fully diploid locus with tumor purity set to 1 (on a scale from 0 to 1), the cancer cell fraction (CCF) should indeed be twice the variant allele frequency (VAF). To address this, I’ve already attempted to adjust inputs in regions without copy number variation (CNV) by applying the formula VAF * normal copy number * tumor purity, assuming both VAF and tumor purity range from 0 to 1. For some samples, I lack specific information on tumor purity, but since these are derived from leukemic blasts, assuming a tumor purity near 1 may be a reasonable approximation.

My focus, however, is on reconstructing clonal architectures from bulk DNA targeted NGS sequencing, which emphasizes accurate cluster assignment. Given this, I’m confused about cases where, in a fully diploid locus (with minor copy number = 1 and major copy number = 1), two high-VAF variants and one low-VAF variant are grouped into the same cluster. Setting minor copy number = 0 and major copy number = 2 seems to better represent this scenario, as default values in some paper I read.

Another question is: what types of CNVs are suitable for reconstructing clonal architecture? My data come from a targeted panel of 45 genes, so these smaller CNVs might not be ideal for inferring genomic DNA gains or losses. I am also unclear on how PyClone uses copy number information to adjust VAF and infer clone assignment.

Thanks a lot,

Best Regards,

Alessio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explain major and minor cn for pyclone input #40

explain major and minor cn for pyclone input #40

endertial commented Jul 2, 2024

ajw2329 commented Oct 24, 2024

endertial commented Oct 25, 2024

explain major and minor cn for pyclone input #40

explain major and minor cn for pyclone input #40

Comments

endertial commented Jul 2, 2024

ajw2329 commented Oct 24, 2024

endertial commented Oct 25, 2024