tutorial on code extensions

PopicLab · Aug 22, 2022 · bdd1ce7 · bdd1ce7
1 parent 89c520d
commit bdd1ce7
Showing 1 changed file with 118 additions and 0 deletions.
diff --git a/notebooks/extensions.ipynb b/notebooks/extensions.ipynb
@@ -0,0 +1,118 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "In this tutorial we will explore how different combinations of image channels can be defined for image generation in Cue, as well as how to define\n",
+    "new SV channels and extend the framework to new sequencing platforms.\n",
+    "\n",
+    "### Creating new SV signal sets using existing SV channels\n",
+    "\n",
+    "SV-informative signals are defined in the ```SVSignals``` enum inside ```img/constants```. For example ```SVSignals.RD``` corresponds to the read-depth channel\n",
+    "and ```SVSignals.RD_CLIPPED``` is the clipped-read channel.\n",
+    "\n",
+    "To generate a specific combination of image channels we need to define a new ```SV_SIGNAL_SET``` type and provide its channel composition\n",
+    "as a new entry in ```SV_SIGNALS_BY_TYPE``` dictionary inside ```img/constants```. For example, we can create a new set called ```DEMO``` that includes\n",
+    "the read-depth and clipped-read signal as follows:\n",
+    "\n",
+    "```\n",
+    "SV_SIGNAL_SET = Enum(\"SV_SIGNAL_SET\", 'SHORT '\n",
+    "                                      ...\n",
+    "                                      *'DEMO'*)\n",
+    "SV_SIGNALS_BY_TYPE = {\n",
+    "    SV_SIGNAL_SET.SHORT: [SVSignals.RD, SVSignals.RD_LOW, SVSignals.SR_RP, SVSignals.LLRR, SVSignals.RL,\n",
+    "                          SVSignals.LLRR_VS_LR],\n",
+    "    ...\n",
+    "    *SV_SIGNAL_SET.DEMO: [SVSignals.RD, SVSignals.RD_CLIPPED]*\n",
+    "}\n",
+    "```\n",
+    "Since both of these signal types are already supported by the framework, we just need to specify this new signal set in the data generation YAML\n",
+    "config in order to switch images to this channel configuration. In particular, we need to update the ```signal_set``` in ```generate.yaml``` as follows:\n",
+    "\n",
+    "```\n",
+    "signal_set: \"DEMO\"\n",
+    "```\n",
+    "\n",
+    "Providing the updated ```generate.yaml``` config file to the generate.py script will generate 2-channel images with the read-depth and clipped-read channels.\n",
+    "\n",
+    "### Creating custom SV signals\n",
+    "\n",
+    "In order to create a new SV channel, we need to extend the ```SVSignals``` definition and add this channel to a signal set (we'll use the ```DEMO``` set).\n",
+    "For example, we can define a channel ```RD_MAX``` that will compute the scalar maximum read depth across two loci as follows:\n",
+    "\n",
+    "```\n",
+    "class SVSignals(str, Enum):\n",
+    "    RD_MAX = \"RD_MAX\"\n",
+    "    ...\n",
+    "\n",
+    "SV_SIGNALS_BY_TYPE = {\n",
+    "    SV_SIGNAL_SET.DEMO: [SVSignals.RD, SVSignals.RD_MAX, SVSignals.RD_CLIPPED]\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "We can define the function that this channel should compute inside the ```make_image()``` function of the ```img/datasets``` file by providing\n",
+    "a conditional clause for this channel:\n",
+    "\n",
+    "```\n",
+    "if signal == constants.SVSignals.RD_MAX:\n",
+    "   counts = self.aln_index.scalar_apply(SVSignals.RD, interval_pair.intervalA, interval_pair.intervalB, op=max)\n",
+    "```\n",
+    "\n",
+    "This definition will apply the ```max``` operator to the read count at each pair of loci represented in the image.\n",
+    "\n",
+    "### Collecting custom alignment features\n",
+    "\n",
+    "In order to generate channels from custom alignment features that are not yet supported by the framework, we need to add these features\n",
+    "to the BAM index file generated by Cue and used to create the image channels.\n",
+    "\n",
+    "The ```seq/aln_index.py``` file contains the logic for BAM indexing. In particular, the method ```add_by_signal(...read, signal...)``` implements\n",
+    "what/how read alignment properties should be extracted to support a specific signal. The ```read``` input is a pysam ```AlignedSegment``` object\n",
+    "which can be queried for various alignment properties of a particular read in the BAM file (this function is called on all the reads in the file).\n",
+    "\n",
+    "For example, in order to support linked-reads, we construct a channel from the barcodes associated with each read.\n",
+    "In particular, the split-molecule channel ```SVSignals.SM``` captures how many barcodes where shared by the reads across\n",
+    "each pair of loci in the image. In order to compute the intersection of barcodes across two loci, we store the barcodes observed\n",
+    "at each locus as follows:\n",
+    "```\n",
+    "bin_id = get_bin_id(...read...)\n",
+    "if signal == SVSignals.SM :\n",
+    "    barcode = read.get_tag('BX')\n",
+    "    self.bins[signal][chr_id][bin_id].add(barcode)\n",
+    "```\n",
+    "\n",
+    "In the code above, the read barcode is extracted from the ```BX``` tag in the BAM file and added to the set of barcodes associated with\n",
+    "the locus of this read. Once the information is collected in the index, the functions ```scalar_apply()``` or ```intersect()``` provided\n",
+    "by the ```AlnIndex``` class (in ```aln_index.py```) can be used implement the desired operation over the values stored in\n",
+    "the bins associated with a pair of loci. In the case of the ```SM``` signal, we can use ```intersect()``` to find the intersection of the barcode\n",
+    "sets at each pair of loci. Additional features specific to a given sequencing platform can be collected and processed using a similar approach."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}