Skip to content

Inconsistent Primer Removal in Different Amplicon Datasets Using Same fastp Command #642

@Rubadevis

Description

@Rubadevis

I am using fastp for trimming amplicon sequencing datasets generated in the same sequencing run, containing two different markers (16SAnimal and CO1), sequenced on the same flowcell. The trimming command and settings are identical for both datasets.

However, I observe the following differences:

For the 16SAnimal marker data, fastp removes all adapters and primers completely, reducing the data size significantly (from 10 GB to ~300 MB).

For the CO1 marker data, fastp fails to remove the reverse primers that are exact match in the R2 reads and retains a large portion of the raw reads, with data size only decreasing modestly (14 GB to 10 GB).

My question is why does fastp perform well on the 16SAnimal data but fails to remove exact matching reverse primers in the CO1 data, using the exact same command?

Additional Informations:

adapter_file_16sA.txt

adapter_file_co1.txt

command used :

fastp -i "$R1" -I "$R2" -o "$TRIM_R1" -O "$TRIM_R2" --adapter_fasta "$adapter_file" --trim_poly_g --trim_poly_x -q 20 --disable_length_filtering --failed_out "$FAILED_OUT" --html "$OUT_DIR/${sample_name}_fastp.html" --json "$OUT_DIR/${sample_name}_fastp.json" >> "$LOGFILE" 2>&1

I have confirmed the primers are present at the expected positions in raw data but only partially trimmed in CO1.

I would appreciate any guidance on improving primer removal in CO1 data with fastp or suggestions if other tools are better suited for complex primer trimming scenarios.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions