-
Notifications
You must be signed in to change notification settings - Fork 89
Description
I aligned 38 sequences, average length 72, but including one outlier sequence with a length of 505 and C-terminal homology.
Same thing happens with muscle or belvu. Alignment is long , but I expect the alignment to be short because I expect gappy N-terminal columns to disappear the way gappy internal columns disappear.
But hmmbuild (two versions, most recent 3.4 from Aug 2023) did this:
input alignment file: foo.afa
output HMM file: foo.HMM
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
idx name nseq alen mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1 foo 38 538 508 1.38 0.591
See how the model length is 508, very few columns dropped?
I attached an example of a file (multifasta format) that results in the incorrect HMM length.
I've been building HMMs since June 1, 1998. I think I always trim the N-terminus and C-terminus until column density exceeds 50%, so I don't think any model of mine has been corrupted.
But I have concerns for anyone who builds an alignment without trimming. A single long N-terminal extension can end up contributing 80 percent of the total score.