Skip to content

Commit

Permalink
tutorial on code extensions
Browse files Browse the repository at this point in the history
  • Loading branch information
viq854 committed Aug 22, 2022
1 parent 89c520d commit bdd1ce7
Showing 1 changed file with 118 additions and 0 deletions.
118 changes: 118 additions & 0 deletions notebooks/extensions.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"In this tutorial we will explore how different combinations of image channels can be defined for image generation in Cue, as well as how to define\n",
"new SV channels and extend the framework to new sequencing platforms.\n",
"\n",
"### Creating new SV signal sets using existing SV channels\n",
"\n",
"SV-informative signals are defined in the ```SVSignals``` enum inside ```img/constants```. For example ```SVSignals.RD``` corresponds to the read-depth channel\n",
"and ```SVSignals.RD_CLIPPED``` is the clipped-read channel.\n",
"\n",
"To generate a specific combination of image channels we need to define a new ```SV_SIGNAL_SET``` type and provide its channel composition\n",
"as a new entry in ```SV_SIGNALS_BY_TYPE``` dictionary inside ```img/constants```. For example, we can create a new set called ```DEMO``` that includes\n",
"the read-depth and clipped-read signal as follows:\n",
"\n",
"```\n",
"SV_SIGNAL_SET = Enum(\"SV_SIGNAL_SET\", 'SHORT '\n",
" ...\n",
" *'DEMO'*)\n",
"SV_SIGNALS_BY_TYPE = {\n",
" SV_SIGNAL_SET.SHORT: [SVSignals.RD, SVSignals.RD_LOW, SVSignals.SR_RP, SVSignals.LLRR, SVSignals.RL,\n",
" SVSignals.LLRR_VS_LR],\n",
" ...\n",
" *SV_SIGNAL_SET.DEMO: [SVSignals.RD, SVSignals.RD_CLIPPED]*\n",
"}\n",
"```\n",
"Since both of these signal types are already supported by the framework, we just need to specify this new signal set in the data generation YAML\n",
"config in order to switch images to this channel configuration. In particular, we need to update the ```signal_set``` in ```generate.yaml``` as follows:\n",
"\n",
"```\n",
"signal_set: \"DEMO\"\n",
"```\n",
"\n",
"Providing the updated ```generate.yaml``` config file to the generate.py script will generate 2-channel images with the read-depth and clipped-read channels.\n",
"\n",
"### Creating custom SV signals\n",
"\n",
"In order to create a new SV channel, we need to extend the ```SVSignals``` definition and add this channel to a signal set (we'll use the ```DEMO``` set).\n",
"For example, we can define a channel ```RD_MAX``` that will compute the scalar maximum read depth across two loci as follows:\n",
"\n",
"```\n",
"class SVSignals(str, Enum):\n",
" RD_MAX = \"RD_MAX\"\n",
" ...\n",
"\n",
"SV_SIGNALS_BY_TYPE = {\n",
" SV_SIGNAL_SET.DEMO: [SVSignals.RD, SVSignals.RD_MAX, SVSignals.RD_CLIPPED]\n",
"}\n",
"```\n",
"\n",
"We can define the function that this channel should compute inside the ```make_image()``` function of the ```img/datasets``` file by providing\n",
"a conditional clause for this channel:\n",
"\n",
"```\n",
"if signal == constants.SVSignals.RD_MAX:\n",
" counts = self.aln_index.scalar_apply(SVSignals.RD, interval_pair.intervalA, interval_pair.intervalB, op=max)\n",
"```\n",
"\n",
"This definition will apply the ```max``` operator to the read count at each pair of loci represented in the image.\n",
"\n",
"### Collecting custom alignment features\n",
"\n",
"In order to generate channels from custom alignment features that are not yet supported by the framework, we need to add these features\n",
"to the BAM index file generated by Cue and used to create the image channels.\n",
"\n",
"The ```seq/aln_index.py``` file contains the logic for BAM indexing. In particular, the method ```add_by_signal(...read, signal...)``` implements\n",
"what/how read alignment properties should be extracted to support a specific signal. The ```read``` input is a pysam ```AlignedSegment``` object\n",
"which can be queried for various alignment properties of a particular read in the BAM file (this function is called on all the reads in the file).\n",
"\n",
"For example, in order to support linked-reads, we construct a channel from the barcodes associated with each read.\n",
"In particular, the split-molecule channel ```SVSignals.SM``` captures how many barcodes where shared by the reads across\n",
"each pair of loci in the image. In order to compute the intersection of barcodes across two loci, we store the barcodes observed\n",
"at each locus as follows:\n",
"```\n",
"bin_id = get_bin_id(...read...)\n",
"if signal == SVSignals.SM :\n",
" barcode = read.get_tag('BX')\n",
" self.bins[signal][chr_id][bin_id].add(barcode)\n",
"```\n",
"\n",
"In the code above, the read barcode is extracted from the ```BX``` tag in the BAM file and added to the set of barcodes associated with\n",
"the locus of this read. Once the information is collected in the index, the functions ```scalar_apply()``` or ```intersect()``` provided\n",
"by the ```AlnIndex``` class (in ```aln_index.py```) can be used implement the desired operation over the values stored in\n",
"the bins associated with a pair of loci. In the case of the ```SM``` signal, we can use ```intersect()``` to find the intersection of the barcode\n",
"sets at each pair of loci. Additional features specific to a given sequencing platform can be collected and processed using a similar approach."
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit bdd1ce7

Please sign in to comment.