Skip to content

GFF3 parent / id remapping issue. #109

@hexylena

Description

@hexylena

Just filing it since even though this library doesn't seem to see a lot of activity lately.

Script:

#!/usr/bin/env python
import sys
from BCBio import GFF
from Bio import SeqIO

seq_dict = SeqIO.to_dict(SeqIO.parse(sys.argv[1], "fasta"))
# Parse GFF3 records
for record in GFF.parse(sys.argv[2], base_dict=seq_dict):
    print record.id
    for feature in record.features:
        print feature.id, [x.id for x in feature.sub_features]

Input files:

>ctgA
ACTG
>ctgB
ACTG
##gff-version 3
##sequence-region ctgA 1 17039
ctgA    glimmer gene    39  209 .   +   .   ID=orf00001
ctgA    Glimmer3    CDS 39  209 .   +   0   ID=cds_orf00001;Parent=orf00001
ctgA    glimmer gene    215 637 .   +   .   ID=orf00002
ctgA    Glimmer3    CDS 215 637 .   +   0   ID=cds_orf00002;Parent=orf00002
ctgA    glimmer gene    637 3588    .   +   .   ID=orf00003
ctgA    Glimmer3    CDS 637 3588    .   +   0   ID=cds_orf00003;Parent=orf00003
ctgA    glimmer gene    3873    4085    .   -   .   ID=orf00004
ctgA    Glimmer3    CDS 3873    4085    .   -   0   ID=cds_orf00004;Parent=orf00004
##gff-version 3
##sequence-region ctgB 1 17040
ctgB    glimmer gene    21  209 .   +   .   ID=orf00001
ctgB    Glimmer3    CDS 21  209 .   +   0   ID=cds_orf00001;Parent=orf00001
ctgB    glimmer gene    215 637 .   +   .   ID=orf00002
ctgB    Glimmer3    CDS 215 637 .   +   0   ID=cds_orf00002;Parent=orf00002
ctgB    glimmer gene    637 3588    .   +   .   ID=orf00003
ctgB    Glimmer3    CDS 637 3588    .   +   0   ID=cds_orf00003;Parent=orf00003
ctgB    glimmer gene    3873    4085    .   -   .   ID=orf00004
ctgB    Glimmer3    CDS 3873    4085    .   -   0   ID=cds_orf00004;Parent=orf00004

Actual Output:

ctgA
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002', 'cds_orf00002']
orf00003 ['cds_orf00003', 'cds_orf00003']
orf00004 ['cds_orf00004', 'cds_orf00004']
ctgB
orf00001_2 ['cds_orf00001']
orf00002 []
orf00003 []
orf00004 []

Expected Output:

ctgA
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002']
orf00003 ['cds_orf00003']
orf00004 ['cds_orf00004']
ctgB
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002']
orf00003 ['cds_orf00003']
orf00004 ['cds_orf00004']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions