Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insertions context issue #209

Open
diegogarcialopez opened this issue Jan 21, 2025 · 0 comments
Open

Insertions context issue #209

diegogarcialopez opened this issue Jan 21, 2025 · 0 comments

Comments

@diegogarcialopez
Copy link

diegogarcialopez commented Jan 21, 2025

Hi AlexandrovLab,
I was working on some samples when I got some unexpected results. One of these samples seems to have some T/A insertions at homopolymer regions, however when I used SigProfilerMatrixGenerator to compute the mutational matrix there was no mutation belonging to the 1:Ins:T:5 class. I decided to look closely on IGV and it definitely looks like these indels should be considered as 1:Ins:T:5.

Therefore, I went to TCGA to check this issue with other samples. However, the same issue appeared. I tried computing the mutational matrix for the sample TCGA-DM-A1D8. This sample contains 2 insertions that look like they should be classified as 1:Ins:T:5 according to the cBioPortal data from IGV.

Image Image

But none of them appears to belong to that class when computing the mutational matrix. I used different versions of SigProfilerMatrixGenerator (v1.1, v1.2 and v.1.2.31) as well as the SigProfilerAssignment webtool (https://cancer.sanger.ac.uk/signatures/assignment/app/), but all give the same results.

Mutational_Profile_ID.pdf

I find this weird, because around 2 years ago I computed the mutational matrixes for this exact TCGA sample and I do had these 2 indels classified as 1:Ins:T:5 mutations. Moreover, according to the literature this should be one of the most prevalent INDEL types in cohorts and I it is completly absent in some mutational matrixes that I have reciently computed.

As a side not, I manually added 1 bp to the start and end positions of these insertions and then they were called as 1:Ins:T:5 mutations. I was worried this could affect other inserions, but let me know whether you think this would this be a potential solution.

Please let me know if I am doing something wrong. If not I would appreciate if you could let me know whether there is a quick solution or a specific version of the package that does not have this potential issue that I could use in the meantime.

Thank you very much in advance.

For reproducibility, this is the code I have used to compute the mutational matrix (tested in Google Colab and in a Linux based HPC):

pip install SigProfilerMatrixGenerator

from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
from SigProfilerMatrixGenerator import install as genInstall
genInstall.install('GRCh37', bash=True)

matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "/content/test/", plot=False, exome=False, bed_file=None, chrom_based=False, tsb_stat=False, seqInfo=False, cushion=100)

Here it is the input file (I just added a .txt extension because otherwise I could not upload it):

TCGA-DM-A1D8_SigProfiler_input.maf.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant