easy-linclust not work with Ungapped alignment step died #866

jackhu3301 · 2024-07-24T09:55:30Z

Expected Behavior

I want use easy-linclust to cluster protein seqs.

Current Behavior

mmseqs easy-linclust all_seq.fasta clusterRes tmp --cov-mode 0 --min-seq-id 0.4

MMseqs Output (for bugs)

Create directory tmp
easy-linclust all_seq.fasta clusterRes tmp --cov-mode 1 --min-seq-id 0.4

MMseqs Version: a146887
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Threads 64
Compressed 0
Verbosity 3
Weight file name
Cluster Weight threshold 0.9
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 0
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.4
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Alphabet size aa:21,nucl:5
k-mers per sequence 21
Spaced k-mers 0
Spaced k-mer pattern
Scale k-mers per sequence aa:0.000,nucl:0.200
Adjust k-mer length false
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
k-mer length 0
Shift hash 67
Split memory limit 0
Include only extendable false
Skip repeating k-mers false
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Remove temporary files true
Force restart with latest tmp false
MPI runner
Database type 0
Shuffle input database true
Createdb mode 1
Write lookup file 0
Offset of numeric ids 0

createdb all_seq.fasta tmp/8115150149931881526/input --dbtype 0 --shuffle 1 --createdb-mode 1 --write-lookup 0 --id-offset 0 --compressed 0 -v 3

Shuffle database cannot be combined with --createdb-mode 0
We recompute with --shuffle 0
Converting sequences
[Multiline fasta can not be combined with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to input_h: 0h 0m 0s 3ms
Time for merging to input: 0h 0m 0s 3ms
[=======
Time for merging to input_h: 0h 0m 0s 2ms
Time for merging to input: 0h 0m 0s 2ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 102ms
Create directory tmp/8115150149931881526/clu_tmp
linclust tmp/8115150149931881526/input tmp/8115150149931881526/clu tmp/8115150149931881526/clu_tmp -e 0.001 --min-seq-id 0.4 -c 0.8 --cov-mode 1 --spaced-kmer-mode 0 --remove-tmp-files 1

Set cluster mode GREEDY MEM.
kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Database size: 77298 type: Aminoacid
Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X)

Generate k-mers list for 1 split
[=================================================================] 77.30K 0s 41ms
Sort kmer 0h 0m 0s 46ms
Sort by rep. sequence 0h 0m 0s 22ms
Time for fill: 0h 0m 0s 11ms
Time for merging to pref: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 225ms
rescorediagonal tmp/8115150149931881526/input tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=================================================================] 77.30K 0s 71ms
Time for merging to pref_rescore1: 0h 0m 0s 102ms
Time for processing: 0h 0m 0s 429ms
clust tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pre_clust --cluster-mode 3 --max-iterations 1000 --similarity-type 2 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Clustering mode: Greedy Low Mem
Total time: 0h 0m 0s 91ms

Size of the sequence database: 77298
Size of the alignment database: 77298
Number of clusters: 31445

Writing results 0h 0m 0s 3ms
Time for merging to pre_clust: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 188ms
createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy -v 3 --subdb-mode 1

Time for merging to input_step_redundancy: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 22ms
createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 -v 3 --subdb-mode 1

Time for merging to pref_filter1: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 23ms
filterdb tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 --filter-file tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy --threads 64 --compressed 0 -v 3

Filtering using file(s)
[=================================================================] 31.44K 0s 20ms
Time for merging to pref_filter2: 0h 0m 0s 88ms
Time for processing: 0h 0m 0s 360ms
rescorediagonal tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 1 --wrapped-scoring 0 --filter-hits 1 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.4 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=========Error: Ungapped alignment step died
Error: Search died

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): a146887
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Source install from github
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: GNU Make 4.1
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): SSE4
Operating system and version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

easy-linclust not work with Ungapped alignment step died #866

easy-linclust not work with Ungapped alignment step died #866

jackhu3301 commented Jul 24, 2024

easy-linclust not work with Ungapped alignment step died #866

easy-linclust not work with Ungapped alignment step died #866

Comments

jackhu3301 commented Jul 24, 2024

Expected Behavior

Current Behavior

MMseqs Output (for bugs)

Your Environment