Parallelize fixFasta

The initial fixFasta step of Pufferfish indexing is single-threaded, and when there are a lot of sequences in the reference it takes a lot of time. From the outside it seems like this step could be parallelized, with the input reference FASTA split into parts, e.g. using the fast SeqKit toolkit and [split2](https://bioinf.shenwei.me/seqkit/usage/#split2) command, which can output gzipped or regular split FASTA files from a gzipped or regular input reference FASTA (to save disk space for example), and then processing each split using fixFasta and concatenating the fixed splits into one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelize fixFasta #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallelize fixFasta #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions