-
Notifications
You must be signed in to change notification settings - Fork 365
Description
I am using fastp for trimming amplicon sequencing datasets generated in the same sequencing run, containing two different markers (16SAnimal and CO1), sequenced on the same flowcell. The trimming command and settings are identical for both datasets.
However, I observe the following differences:
For the 16SAnimal marker data, fastp removes all adapters and primers completely, reducing the data size significantly (from 10 GB to ~300 MB).
For the CO1 marker data, fastp fails to remove the reverse primers that are exact match in the R2 reads and retains a large portion of the raw reads, with data size only decreasing modestly (14 GB to 10 GB).
My question is why does fastp perform well on the 16SAnimal data but fails to remove exact matching reverse primers in the CO1 data, using the exact same command?
Additional Informations:
command used :
fastp -i "$R1" -I "$R2" -o "$TRIM_R1" -O "$TRIM_R2" --adapter_fasta "$adapter_file" --trim_poly_g --trim_poly_x -q 20 --disable_length_filtering --failed_out "$FAILED_OUT" --html "$OUT_DIR/${sample_name}_fastp.html" --json "$OUT_DIR/${sample_name}_fastp.json" >> "$LOGFILE" 2>&1
I have confirmed the primers are present at the expected positions in raw data but only partially trimmed in CO1.
I would appreciate any guidance on improving primer removal in CO1 data with fastp or suggestions if other tools are better suited for complex primer trimming scenarios.
Thank you!