Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A5_A3 get_peptide_sequence.py associates same TPMs to different samples #5

Open
FraPria opened this issue May 17, 2022 · 3 comments
Open

Comments

@FraPria
Copy link

FraPria commented May 17, 2022

Hello, thank you for developing this useful pipeline!
I have a technical question that I would like to address you.

I noticed from the file file A5_A3_NetMHC-4.0_junctions_ORF_neoantigens.tab that samples that share the same event share also the same Transcript_TPM.
You can see it from the header of the file (selecting only columns of interest):

Sample_id       Alt_Junction_id Transcript_id   Transcript_TPM
pat1   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat2   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat3   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat4   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat5   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773

While if you select the same transcript from iso_tpms.txt matrix they are different.

pat1	pat2	pat3	pat4	pat5
3.092407	3.750489	7.15175	13.89057	4.364625

This seems to rise from line 136 of lib/A5_A3/get_peptide_sequence.py where it takes only the first column of the iso_tpms.txt matrix:

tokens = line.rstrip().split("\t")
transcript = tokens[0]
tpm = tokens[1]
if (transcript not in transcript_expression):
    transcript_expression[transcript] = tpm

So I tested if swapping the columns of iso_tpms.txt could change the results and it did.
For the other events this does not happen, and the code is a bit different. For example for the Exonizations it considers all the iso_tpms.txt columns:

tokens = line.rstrip().split("\t")
transcript = tokens[0]
tpm = tokens[1:]
for i in range(0,len(tpm)):
    if (transcript not in transcript_expression[header[i]]):
        transcript_expression[header[i]][transcript] = float(tpm[i])

Should I use this piece of code also for the A5_A3?
Thank you in advance

@JLTrincado
Copy link
Contributor

Hi,

Yes, this seems a bug indeed. I have changed it accordingly and quickly tested it and it seems to go smooth. Could you test it as well? I created a new branch to test this.

Thanks for your help.

Best regards,

Juanlu.

@FraPria
Copy link
Author

FraPria commented May 20, 2022

Hi, thanks for your feedback!

I just tested it but it rises the error:
2022-05-20 13:54:04,566 - lib.A5_A3.get_peptide_sequence - ERROR - ERROR: NameError("name 'sample_id' is not defined")

I added
sample_id = tokens[0].replace(" ","")
at the lines 253 and 1005 and it worked.

Thank you,
have a nice day!

EduEyras added a commit that referenced this issue May 20, 2022
Related issue #5 - Thanks FraPria
@EduEyras
Copy link
Member

EduEyras commented May 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants