rafm Reliable AlphaFold Measures

AlphaFold model and two crystal structures of calmodulin

rafm computes per-model measures such as expected global LDDT associated with atomic-level accuracy for AlphaFold models from pLDDT confidence scores.

Installation

You can install rafm via pip from PyPI:

$ pip install rafm

Usage

rafm --help lists all commands. Current commands are:

plddt-stats
Calculate stats on bounded pLDDTs from list of AlphaFold model files. in PDB format.

Options:
- --criterion FLOAT
  
  The cutoff value on truncated pLDDT for possible utility. [default: 91.2]
- --min-length INTEGER
  
  The minimum sequence length for which to calculate truncated stats. [default: 20]
- --min-count INTEGER
  
  The minimum number of truncated pLDDT values for which to calculate stats [default: 20]
- --lower-bound INTEGER
  
  The pLDDT value below which stats will not be calculated. [default: 80]
- --upper-bound INTEGER
  
  The pLDDT value above which stats will not be calculated. [default: 100]
- --file-stem TEXT
  
  Output file name stem. [default: rafm]
Output columns (where NN is the bounds specifier, default: 80):
- residues_in_pLDDT
  
  The number of residues in the AlphaFold model.
- pLDDT_mean
  
  The mean value of pLDDT over all residues.
- pLDDT_median
  
  The median value of pLDDT over all residues.
- pLDDTNN_count
  
  The number of residues within bounds.
- pLDDTNN_frac
  
  The fraction of pLDDT values within bounds, if the count is greater than the minimum.
- pLDDTNNN_mean
  
  The mean of pLDDT values within bounds, if the count is greater than the minimum.
- pLDDTNN_median
  
  The median of pLDDT values within bounds, if the count is greater than the minimum.
- LDDT_expect
  
  The expectation value of global LDDT over the residues with LDDT within bounds. Only produced if default bounds are used.
- passing
  
  True if the model passed the criterion, False otherwise. Only produced if default bounds are used.
- file
  
  The path to the model file.
plddt-select-residues
Writes a tab-separated file of residues from passing models, using an input file of values selected by plddt-stats. Input options are the same as plddt-stats.

Output columns:
- file
  
  Path to the model file.
- residue
  
  Residue number, starting from 0 and numbered sequentially. Note that all residues will be written, regardless of bounds set.
- pLDDT
  
  pLDDT value for that residue.
plddt-plot-dists
Plot the distributions on the bounded pLDDT and residues in models that pass the selection criteria.
Input Options:
out-file-type

Plot file extension of a type that matplotlib understands, (e.g., 'jpg', 'pdf') [default: png]

residue-criterion

Per-residue cutoff on usability (for plot only).
Outputs:

When applied to set of "dark" genomes with no previous PDB entries, the distributions of median pLDDT scores with a lower bound of 80 and per-residue pLDDT scores with a minimum of 80 looks like this:
stats

Produce a set of summary stats on results of runs. See also the global stats file rafm_stats.json.

Statistical Basis

The default parameters were chosen to select for LDDT values of greater than 80 on a set of crystal structures obtained since AlphaFold was trained. The distributions of LDDT scores for the passing and non-passing sets, along with an (overlapping) set of PDB files at 100% sequence identity over at least 80% of the sequence looks like this:

Distribution of high-scoring, low-scoring, and high-similarity structures

The markers on the x-axis refer to the size of conformational changes observed in conformational changes in various protein crystal structures:

CALM

Between calcum-bound and calcium-free calmodulin (depicted in the logo image above).
ERK2

Between unphosphorylated and doubly-phosphorylated ERK2 kinase.
HB

Between R- and T-state hemoglobin
MB

Between carbonmonoxy- and deoxy-myoglobin

The value of LDDT >= 80 we selected as the minimum value that was likely to prove useful for virtual screening. The per-residue value of pLDDT >= 80 was also chosen as the minimum likely to give the correct side-chain rotamers for a surface defined by contacts between two residues. A choice of 91.2 as a criterion leads to the following confusion matrix versus a set of post-training crystal structures:

Confusion matrix of AlphaFold models vs. crystal structures

At a correlation coefficient of 0.71, this correlation isn't great, but enough to demonstrate a usable sensitivity. After we fix a few problems with the alignments, it may go a bit higher but our feeling is probably not more than about 0.8. The support will get better, but the criterion on this metrix seems unlikely to change.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, rafm is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from the UNM Translational Informatics Python Cookiecutter template.

rafm was written by Joel Berendzen and Jessica Binder.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
docs		docs
src/rafm		src/rafm
tests		tests
.cookiecutter.json		.cookiecutter.json
.darglint		.darglint
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
FUNDING.rst		FUNDING.rst
LICENSE.rst		LICENSE.rst
README.rst		README.rst
codecov.yml		codecov.yml
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rafm Reliable AlphaFold Measures

Installation

Usage

Statistical Basis

Contributing

License

Issues

Credits

About

Releases 1

Packages

Languages

License

unmtransinfo/rafm

Folders and files

Latest commit

History

Repository files navigation

rafm Reliable AlphaFold Measures

Installation

Usage

Statistical Basis

Contributing

License

Issues

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages