Skip to content

Commit df6bd9e

Browse files
committed
Describe taxonomy/pairing DB change
1 parent a7a6e6e commit df6bd9e

File tree

1 file changed

+8
-74
lines changed

1 file changed

+8
-74
lines changed

README.md

Lines changed: 8 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ For details of what was changed in v1.5, see [change log](https://github.com/sok
44

55
<p align="center"><img src="https://github.com/sokrypton/ColabFold/raw/main/.github/ColabFold_Marv_Logo.png" height="250"/></p>
66

7+
```diff
8+
+ 04Aug2025: We changed the taxonomy/pairing files for the UniRef100 database. This might affect multimer predictions. Check [the wiki entry](https://github.com/sokrypton/ColabFold/wiki/MSA-Server-Database-History) for details.
9+
```
10+
711
### Making Protein folding accessible to all via Google Colab!
812

913
| Notebooks | monomers | complexes | mmseqs2 | jackhmmer | templates |
@@ -30,10 +34,9 @@ Check the wiki page [old retired notebooks](https://github.com/sokrypton/ColabFo
3034
- Yes, but be **CAREFUL**, the bfactor column is populated with pLDDT confidence values (higher = better). Phenix.phaser expects a "real" bfactor, where (lower = better). See [post](https://twitter.com/cheshireminima/status/1423929241675120643) from Claudia Millán.
3135
- What is the maximum length?
3236
- Limits depends on free GPU provided by Google-Colab `fingers-crossed`
33-
- For GPU: `Tesla T4` or `Tesla P100` with ~16G the max length is ~2000
34-
- For GPU: `Tesla K80` with ~12G the max length is ~1000
37+
- For GPU: `Tesla T4` with ~16G the max length is ~2000
3538
- To check what GPU you got, open a new code cell and type `!nvidia-smi`
36-
- Is it okay to use the MMseqs2 MSA server (`cf.run_mmseqs2`) on a local computer?
39+
- Is it okay to use the MMseqs2 MSA server on a local computer?
3740
- You can access the server from a local computer if you queries are serial from a single IP. Please do not use multiple computers to query the server.
3841
- Where can I download the databases used by ColabFold?
3942
- The databases are available at [colabfold.mmseqs.com](https://colabfold.mmseqs.com)
@@ -80,7 +83,7 @@ colabfold_batch input_sequences.fasta out_dir
8083

8184
First create a directory for the databases on a disk with sufficient storage (940GB (!)). Depending on where you are, this will take a couple of hours:
8285

83-
Note: [MMseqs2 `71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1` (May 28, 2023)](https://github.com/soedinglab/MMseqs2/archive/71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1.zip) is used to create the databases and perform sequece search in the ColabFold MSA server. Please use this version if you want to obtain the same MSAs as the server.
86+
Note: [MMseqs2 Release 18](https://github.com/soedinglab/MMseqs2/releases/tag/18-8cc5c) is used to create the databases and perform sequece search in the ColabFold MSA server. Please use this version if you want to obtain the same MSAs as the server.
8487

8588
```shell
8689
MMSEQS_NO_INDEX=1 ./setup_databases.sh /path/to/db_folder
@@ -192,7 +195,7 @@ Important: Ensure that the `CUDA_VISIBLE_DEVICES` environment variable is set co
192195

193196
Run searches using the GPU server:
194197
```
195-
colabfold_search /path/to/bin/mmseqs input_sequences.fasta /path/to/db_folder msas --gpu 1 --gpu-server 1
198+
colabfold_search --mmseqs /path/to/bin/mmseqs input_sequences.fasta /path/to/db_folder msas --gpu 1 --gpu-server 1
196199
```
197200
To stop the server(s) when done:
198201
```
@@ -234,72 +237,3 @@ For more details, see [GPU-accelerated search](https://github.com/soedinglab/MMs
234237
Science (2021) doi: [10.1126/science.abj8754](https://doi.org/10.1126/science.abj8754)
235238

236239
[![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5123296.svg)](https://doi.org/10.5281/zenodo.5123296)
237-
238-
-----------------
239-
**OLD Updates**
240-
```diff
241-
31Jul2023: 2023/07/31: The ColabFold MSA server is back to normal
242-
It was using older DB (UniRef30 2202/PDB70 220313) from 27th ~8:30 AM CEST to 31st ~11:10 AM CEST.
243-
27Jul2023: ColabFold MSA server issue:
244-
We are using the backup server with old databases
245-
(UniRef30 2202/PDB70 220313) starting from ~8:30 AM CEST until we resolve the issue.
246-
Resolved on 31Jul2023 ~11:10 CEST.
247-
12Jun2023: New databases! UniRef30 updated to 2302 and PDB to 230517.
248-
We now use PDB100 instead of PDB70 (see notes in the [main](https://colabfold.com) notebook).
249-
12Jun2023: We introduced a new default pairing strategy:
250-
Previously, for multimer predictions with more than 2 chains,
251-
we only pair if all sequences taxonomically match ("complete" pairing).
252-
The new default "greedy" strategy pairs any taxonomically matching subsets.
253-
30Apr2023: Amber is working again in our ColabFold Notebook
254-
29Apr2023: Amber is not working in our Notebook due to Colab update
255-
18Feb2023: v1.5.2 - fixing: fixing memory leak for large proteins
256-
- fixing: --use_dropout (random seed was not changing between recycles)
257-
06Feb2023: v1.5.1 - fixing: --save-all/--save-recycles
258-
04Feb2023: v1.5.0 - ColabFold updated to use AlphaFold v2.3.1!
259-
03Jan2023: The MSA server's faulty hardware from 12/26 was replaced.
260-
There were intermittent failures on 12/26 and 1/3. Currently,
261-
there are no known issues. Let us know if you experience any.
262-
10Oct2022: Bugfix: random_seed was not being used for alphafold-multimer.
263-
Same structure was returned regardless of defined seed. This
264-
has been fixed!
265-
13Jul2022: We have set up a new ColabFold MSA server provided by Korean
266-
Bioinformation Center. It provides accelerated MSA generation,
267-
we updated the UniRef30 to 2022_02 and PDB/PDB70 to 220313.
268-
11Mar2022: We use in default AlphaFold-multimer-v2 weights for complex modeling.
269-
We also offer the old complex modes "AlphaFold-ptm" or "AlphaFold-multimer-v1"
270-
04Mar2022: ColabFold now uses a much more powerful server for MSAs and searches through the ColabFoldDB instead of BFD/MGnify.
271-
Please let us know if you observe any issues.
272-
26Jan2022: AlphaFold2_mmseqs2, AlphaFold2_batch and colabfold_batch's multimer complexes predictions are
273-
now in default reranked by iptmscore*0.8+ptmscore*0.2 instead of ptmscore
274-
16Aug2021: WARNING - MMseqs2 API is undergoing upgrade, you may see error messages.
275-
17Aug2021: If you see any errors, please report them.
276-
17Aug2021: We are still debugging the MSA generation procedure...
277-
20Aug2021: WARNING - MMseqs2 API is undergoing upgrade, you may see error messages.
278-
To avoid Google Colab from crashing, for large MSA we did -diff 1000 to get
279-
1K most diverse sequences. This caused some large MSA to degrade in quality,
280-
as sequences close to query were being merged to single representive.
281-
We are working on updating the server (today) to fix this, by making sure
282-
that both diverse and sequences close to query are included in the final MSA.
283-
We'll post update here when update is complete.
284-
21Aug2021 The MSA issues should now be resolved! Please report any errors you see.
285-
In short, to reduce MSA size we filter (qsc > 0.8, id > 0.95) and take 3K
286-
most diverse sequences at different qid (sequence identity to query) intervals
287-
and merge them. More specifically 3K sequences at qid at (0→0.2),(0.2→0.4),
288-
(0.4→0.6),(0.6→0.8) and (0.8→1). If you submitted your sequence between
289-
16Aug2021 and 20Aug2021, we recommend submitting again for best results!
290-
21Aug2021 The use_templates option in AlphaFold2_mmseqs2 is not properly working. We are
291-
working on fixing this. If you are not using templates, this does not affect the
292-
the results. Other notebooks that do not use_templates are unaffected.
293-
21Aug2021 The templates issue is resolved!
294-
11Nov2021 [AlphaFold2_mmseqs2] now uses Alphafold-multimer for complex (homo/hetero-oligomer) modeling.
295-
Use [AlphaFold2_advanced] notebook for the old complex prediction logic.
296-
11Nov2021 ColabFold can be installed locally using pip!
297-
14Nov2021 Template based predictions works again in the Alphafold2_mmseqs2 notebook.
298-
14Nov2021 WARNING "Single-sequence" mode in AlphaFold2_mmseqs2 and AlphaFold2_batch was broken
299-
starting 11Nov2021. The MMseqs2 MSA was being used regardless of selection.
300-
14Nov2021 "Single-sequence" mode is now fixed.
301-
20Nov2021 WARNING "AMBER" mode in AlphaFold2_mmseqs2 and AlphaFold2_batch was broken
302-
starting 11Nov2021. Unrelaxed proteins were returned instead.
303-
20Nov2021 "AMBER" is fixed thanks to Kevin Pan
304-
```
305-
-----------------

0 commit comments

Comments
 (0)