Skip to content

dropping columns that have only one sequence in it can fail at the N-terminus. #331

@danielhhaft

Description

@danielhhaft

I aligned 38 sequences, average length 72, but including one outlier sequence with a length of 505 and C-terminal homology.

Same thing happens with muscle or belvu. Alignment is long , but I expect the alignment to be short because I expect gappy N-terminal columns to disappear the way gappy internal columns disappear.

But hmmbuild (two versions, most recent 3.4 from Aug 2023) did this:

input alignment file: foo.afa

output HMM file: foo.HMM

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

idx name nseq alen mlen eff_nseq re/pos description

#---- -------------------- ----- ----- ----- -------- ------ -----------
1 foo 38 538 508 1.38 0.591

See how the model length is 508, very few columns dropped?

HMMER_BUILD_ERROR.msf.txt

I attached an example of a file (multifasta format) that results in the incorrect HMM length.

I've been building HMMs since June 1, 1998. I think I always trim the N-terminus and C-terminus until column density exceeds 50%, so I don't think any model of mine has been corrupted.

But I have concerns for anyone who builds an alignment without trimming. A single long N-terminal extension can end up contributing 80 percent of the total score.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions