Skip to content

Conversation

@nh13
Copy link
Member

@nh13 nh13 commented May 20, 2024

No description provided.

@nh13 nh13 temporarily deployed to github-actions May 20, 2024 16:53 — with GitHub Actions Inactive
@nh13 nh13 requested review from msto and tfenne May 20, 2024 16:53
@nh13 nh13 temporarily deployed to github-actions May 20, 2024 16:55 — with GitHub Actions Inactive
@nh13 nh13 temporarily deployed to github-actions May 20, 2024 16:55 — with GitHub Actions Inactive
@nh13 nh13 temporarily deployed to github-actions May 20, 2024 16:56 — with GitHub Actions Inactive
@nh13 nh13 temporarily deployed to github-actions May 20, 2024 16:56 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented May 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (ba0788e) to head (9856fe1).
Report is 12 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #989   +/-   ##
=======================================
  Coverage   95.62%   95.63%           
=======================================
  Files         126      126           
  Lines        7364     7380   +16     
  Branches      500      498    -2     
=======================================
+ Hits         7042     7058   +16     
  Misses        322      322           
Flag Coverage Δ
unittests 95.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +412 to +413
recs(0).apply[String]("RX") shouldBe "ACGT-CGTA-GG-CC"
recs(1).apply[String]("RX") shouldBe "TTGA-TAAT-TA-AA"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these suffixed with -GG-CC and -TA-AA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the method docs, the UMIs may be extracted from the read names, the read sequences, or both. In this case, the read structure shows UMI bases in the read sequences themselves, as well as the comment in the read name header, so we get four (!) UMI segments, two from the read sequences, and two from the comment in the read header.

@nh13 nh13 requested a review from msto May 20, 2024 22:12
@nh13 nh13 assigned tfenne and unassigned tfenne Mar 3, 2025
@arg(flag='q', doc="Tag in which to store molecular barcode/UMI qualities.") val umiQualTag: Option[String] = None,
@arg(flag='Q', doc="Store the sample barcode qualities in the QT Tag.") val storeSampleBarcodeQualities: Boolean = false,
@arg(flag='n', doc="Extract UMI(s) from read names and prepend to UMIs from reads.") val extractUmisFromReadNames: Boolean = false,
@arg(flag='n', doc="Extract UMI(s) from read names and prepend to UMIs from reads.", mutex=Array("extractUmisFromReadComment"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about the mutex parameter to @arg!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are mutually exclusive args rendered in the CLI help doc? Is it worth noting that the two arguments are mutually exclusive in the usage above, or will that be automatically noted in the help for the respective flags?

/**
* Extracts the UMI from an Illumina fastq style read name. Illumina documents their FASTQ read names as:
* @<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos>:<UMI> <read>:<is filtered>:<control number>:<index>
* `@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos>:<UMI> <read>:<is filtered>:<control number>:<index>``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an extra backtick at the end?

Comment on lines +92 to +93
* Extracts the UMI from an Illumina fastq style read name. Illumina documents their FASTQ read names as:
* `@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos>:<UMI> <read>:<is filtered>:<control number>:<index>``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated to reflect the expected structure of the read comment?

* If `strict` is false the last segment is returned so long as it appears to be a valid UMI.
*/
def extractUmisFromReadComment(comment: String, delimiter: Char = ':', strict: Boolean): Option[String] = {
// If strict, check that the read name actually has eight parts, which is expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If strict, check that the read name actually has eight parts, which is expected
// If strict, check that the read name actually has four parts, which is expected

Comment on lines +102 to +103
* If `strict` is true the comment _must_ contain either 4 colon-separated segments,
* with the UMI being the last in the case of 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far enough removed from the original motivation for this PR that I don't recall - is there a convention for the comment to have four segments?

* field may contain multiple UMIs, in which case they will delimit them with `+` characters. Pluses will be
* translated to hyphens before returning.
*
* If `strict` is true the comment _must_ contain either 4 colon-separated segments,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If `strict` is true the comment _must_ contain either 4 colon-separated segments,
* If `strict` is true the comment _must_ contain 4 colon-separated segments,

As far as I can tell, there's no other permissible condition when strict is true.

strict also controls the validation - should the details of what constitutes a valid UMI be included here as well?

Comment on lines +177 to +178
if (extractUmisFromReadNames) Umis.extractUmisFromReadName(fqs.head.name, strict=true)
else if (extractUmisFromReadComment) fqs.head.comment.flatMap(comment => Umis.extractUmisFromReadComment(comment, strict=true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since strict is set to true and is not configurable at the CLI, should the CLI usage describe the constraints imposed by strict extraction?

@msto msto assigned nh13 and unassigned msto Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants