Skip to content

Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

@ihh

Description

@ihh

HMMER weights IUPAC degenerate emissions using the reciprocal of the perplexity of the underlying match state (see esl_abc_FExpectScore function in HMMER3 source)

This has the effect that the "score" for those emissions is the expectation of what you'd get if you randomized X's using the underlying emission distribution - much to the chagrin of Roger Sewell, who argued they should be treated as missing data (Sean's counterargument is that this
would reward their alignment to the model) - this is an old argument

Practically (as noted by @jordisr) this affects <1% of sequences, but for full hmmer compatibility we ought to include it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions