Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easy-linclust not work with Ungapped alignment step died #866

Open
jackhu3301 opened this issue Jul 24, 2024 · 0 comments
Open

easy-linclust not work with Ungapped alignment step died #866

jackhu3301 opened this issue Jul 24, 2024 · 0 comments

Comments

@jackhu3301
Copy link

Expected Behavior

I want use easy-linclust to cluster protein seqs.

Current Behavior

mmseqs easy-linclust all_seq.fasta clusterRes tmp --cov-mode 0 --min-seq-id 0.4

MMseqs Output (for bugs)

Create directory tmp
easy-linclust all_seq.fasta clusterRes tmp --cov-mode 1 --min-seq-id 0.4

MMseqs Version: a146887
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Threads 64
Compressed 0
Verbosity 3
Weight file name
Cluster Weight threshold 0.9
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 0
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.4
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Alphabet size aa:21,nucl:5
k-mers per sequence 21
Spaced k-mers 0
Spaced k-mer pattern
Scale k-mers per sequence aa:0.000,nucl:0.200
Adjust k-mer length false
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
k-mer length 0
Shift hash 67
Split memory limit 0
Include only extendable false
Skip repeating k-mers false
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Remove temporary files true
Force restart with latest tmp false
MPI runner
Database type 0
Shuffle input database true
Createdb mode 1
Write lookup file 0
Offset of numeric ids 0

createdb all_seq.fasta tmp/8115150149931881526/input --dbtype 0 --shuffle 1 --createdb-mode 1 --write-lookup 0 --id-offset 0 --compressed 0 -v 3

Shuffle database cannot be combined with --createdb-mode 0
We recompute with --shuffle 0
Converting sequences
[Multiline fasta can not be combined with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to input_h: 0h 0m 0s 3ms
Time for merging to input: 0h 0m 0s 3ms
[=======
Time for merging to input_h: 0h 0m 0s 2ms
Time for merging to input: 0h 0m 0s 2ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 102ms
Create directory tmp/8115150149931881526/clu_tmp
linclust tmp/8115150149931881526/input tmp/8115150149931881526/clu tmp/8115150149931881526/clu_tmp -e 0.001 --min-seq-id 0.4 -c 0.8 --cov-mode 1 --spaced-kmer-mode 0 --remove-tmp-files 1

Set cluster mode GREEDY MEM.
kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Database size: 77298 type: Aminoacid
Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X)

Generate k-mers list for 1 split
[=================================================================] 77.30K 0s 41ms
Sort kmer 0h 0m 0s 46ms
Sort by rep. sequence 0h 0m 0s 22ms
Time for fill: 0h 0m 0s 11ms
Time for merging to pref: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 225ms
rescorediagonal tmp/8115150149931881526/input tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=================================================================] 77.30K 0s 71ms
Time for merging to pref_rescore1: 0h 0m 0s 102ms
Time for processing: 0h 0m 0s 429ms
clust tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pre_clust --cluster-mode 3 --max-iterations 1000 --similarity-type 2 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Clustering mode: Greedy Low Mem
Total time: 0h 0m 0s 91ms

Size of the sequence database: 77298
Size of the alignment database: 77298
Number of clusters: 31445

Writing results 0h 0m 0s 3ms
Time for merging to pre_clust: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 188ms
createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy -v 3 --subdb-mode 1

Time for merging to input_step_redundancy: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 22ms
createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 -v 3 --subdb-mode 1

Time for merging to pref_filter1: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 23ms
filterdb tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 --filter-file tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy --threads 64 --compressed 0 -v 3

Filtering using file(s)
[=================================================================] 31.44K 0s 20ms
Time for merging to pref_filter2: 0h 0m 0s 88ms
Time for processing: 0h 0m 0s 360ms
rescorediagonal tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 1 --wrapped-scoring 0 --filter-hits 1 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.4 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=========Error: Ungapped alignment step died
Error: Search died

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): a146887
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Source install from github
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: GNU Make 4.1
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): SSE4
  • Operating system and version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant