Passing large numbers of files to createsetdb #5

SDmetagenomics · 2024-10-30T03:06:35Z

I would like to run spacedust on a plasmid database. This database has ~60k individual files that represent separate plasmid "genomes". However when I pass the following command to spacedust:

$spacedust createsetdb /individual_faa/*.faa SpacedustDB tmp --threads 18

bash: /shared/software/bin/spacedust: Argument list too long

I receive a bash error that the arguments list is too long. I have tried a number of workarounds to this such as passing an environment variable that contains all the file names...but to no avail

It would be useful if instead of passing a file glob (*), that spacedust createsetdb could instead take a single input file with paths to each of the .faa files needed for db creation. Alternatively if I could create databases in batches and combine them that could be another approach, just not sure if that is supported. Finally, if you have any other suggestions I would be forever greatful.

In terms of the total number of proteins in these plasmid "genomes" it would be quite similar to the 9000 genomes you ran in the spacedust paper since plasmids are much smaller in size. So I think computationally it should be managable just trouble getting all the files in :-)

My Environment

Linux
Using Statically compiled spacedust executable for AVX2 instruction set

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing large numbers of files to createsetdb #5

Passing large numbers of files to createsetdb #5

SDmetagenomics commented Oct 30, 2024

Passing large numbers of files to createsetdb #5

Passing large numbers of files to createsetdb #5

Comments

SDmetagenomics commented Oct 30, 2024

My Environment