@@ -1262,7 +1262,6 @@ <h1 class="title">Polyglot programming for single-cell analysis</h1>
1262
1262
1263
1263
<p class="date">2024-09-12</p>
1264
1264
</section>
1265
- <section>
1266
1265
<section id="introduction" class="title-slide slide level1 center">
1267
1266
<h1>Introduction</h1>
1268
1267
<ol type="1">
@@ -1271,14 +1270,16 @@ <h1>Introduction</h1>
1271
1270
</ol>
1272
1271
<p>We will be focusing on R & Python</p>
1273
1272
</section>
1274
- <section id="summary" class="slide level2">
1275
- <h2>Summary</h2>
1273
+
1274
+ <section id="summary" class="title-slide slide level1 center">
1275
+ <h1>Summary</h1>
1276
1276
<p><strong>Interoperability</strong> between languages allows analysts to take advantage of the strengths of different ecosystems</p>
1277
1277
<p><strong>On-disk</strong> interoperability uses standard file formats to transfer data and is typically more reliable</p>
1278
1278
<p><strong>In-memory</strong> interoperability transfers data directly between parallel sessions and is convenient for interactive analysis</p>
1279
1279
<p>While interoperability is currently possible developers continue to improve the experience</p>
1280
1280
<p><a href="https://www.sc-best-practices.org/introduction/interoperability.html">Single-cell best practices: Interoperability</a></p>
1281
- </section></section>
1281
+ </section>
1282
+
1282
1283
<section id="how-do-you-interact-with-a-package-in-another-language" class="title-slide slide level1 center">
1283
1284
<h1>How do you interact with a package in another language?</h1>
1284
1285
<ol type="1">
@@ -1410,7 +1411,7 @@ <h1>Rpy2: basics</h1>
1410
1411
<li><code>rpy2.robjects</code>, the high-level interface</li>
1411
1412
</ul></li>
1412
1413
</ul>
1413
- <div id="ea68de58 " class="cell" data-execution_count="1">
1414
+ <div id="d6998f0a " class="cell" data-execution_count="1">
1414
1415
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href></a><span class="im">import</span> rpy2</span>
1415
1416
<span id="cb4-2"><a href></a><span class="im">import</span> rpy2.robjects <span class="im">as</span> robjects</span>
1416
1417
<span id="cb4-3"><a href></a></span>
@@ -1437,7 +1438,7 @@ <h1>Rpy2: basics</h1>
1437
1438
1438
1439
<section id="rpy2-basics-1" class="title-slide slide level1 center">
1439
1440
<h1>Rpy2: basics</h1>
1440
- <div id="5572dd32 " class="cell" data-execution_count="2">
1441
+ <div id="f6fb7846 " class="cell" data-execution_count="2">
1441
1442
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href></a>str_vector <span class="op">=</span> robjects.StrVector([<span class="st">'abc'</span>, <span class="st">'def'</span>, <span class="st">'ghi'</span>])</span>
1442
1443
<span id="cb5-2"><a href></a>flt_vector <span class="op">=</span> robjects.FloatVector([<span class="fl">0.3</span>, <span class="fl">0.8</span>, <span class="fl">0.7</span>])</span>
1443
1444
<span id="cb5-3"><a href></a>int_vector <span class="op">=</span> robjects.IntVector([<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>])</span>
@@ -1457,7 +1458,7 @@ <h1>Rpy2: basics</h1>
1457
1458
1458
1459
<section id="rpy2-numpy" class="title-slide slide level1 center">
1459
1460
<h1>Rpy2: numpy</h1>
1460
- <div id="5a5d076d " class="cell" data-execution_count="3">
1461
+ <div id="84dfd14d " class="cell" data-execution_count="3">
1461
1462
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
1462
1463
<span id="cb7-2"><a href></a></span>
1463
1464
<span id="cb7-3"><a href></a><span class="im">from</span> rpy2.robjects <span class="im">import</span> numpy2ri</span>
@@ -1469,18 +1470,18 @@ <h1>Rpy2: numpy</h1>
1469
1470
<span id="cb7-9"><a href></a> mtx <span class="op">=</span> robjects.r.matrix(rd_m, nrow <span class="op">=</span> <span class="dv">5</span>)</span>
1470
1471
<span id="cb7-10"><a href></a> <span class="bu">print</span>(mtx)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1471
1472
<div class="cell-output cell-output-stdout">
1472
- <pre><code>[[0.69525594 0.29780005 0.41267065 0.25871805 ]
1473
- [0.88313251 0.79471121 0.5369112 0.24752835 ]
1474
- [0.68812232 0.24265455 0.51419239 0.80029227 ]
1475
- [0.43218943 0.37441082 0.05505875 0.23599726 ]
1476
- [0.58236939 0.34859652 0.14651556 0.24370712 ]]</code></pre>
1473
+ <pre><code>[[0.73294749 0.55953375 0.69944132 0.52744075 ]
1474
+ [0.09756794 0.39535684 0.80669803 0.10540606 ]
1475
+ [0.35662206 0.70148737 0.12002733 0.28026677 ]
1476
+ [0.19947608 0.84421019 0.82702188 0.82531633 ]
1477
+ [0.56938249 0.04640811 0.34178679 0.3285883 ]]</code></pre>
1477
1478
</div>
1478
1479
</div>
1479
1480
</section>
1480
1481
1481
1482
<section id="rpy2-pandas" class="title-slide slide level1 center">
1482
1483
<h1>Rpy2: pandas</h1>
1483
- <div id="477fe152 " class="cell" data-execution_count="4">
1484
+ <div id="f47e193f " class="cell" data-execution_count="4">
1484
1485
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
1485
1486
<span id="cb9-2"><a href></a></span>
1486
1487
<span id="cb9-3"><a href></a><span class="im">from</span> rpy2.robjects <span class="im">import</span> pandas2ri</span>
@@ -1503,7 +1504,7 @@ <h1>Rpy2: pandas</h1>
1503
1504
1504
1505
<section id="rpy2-sparse-matrices" class="title-slide slide level1 center">
1505
1506
<h1>Rpy2: sparse matrices</h1>
1506
- <div id="7513f866 " class="cell" data-execution_count="5">
1507
+ <div id="fd0cc8dd " class="cell" data-execution_count="5">
1507
1508
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href></a><span class="im">import</span> scipy <span class="im">as</span> sp</span>
1508
1509
<span id="cb11-2"><a href></a></span>
1509
1510
<span id="cb11-3"><a href></a><span class="im">from</span> anndata2ri <span class="im">import</span> scipy2ri</span>
@@ -1515,12 +1516,12 @@ <h1>Rpy2: sparse matrices</h1>
1515
1516
<span id="cb11-9"><a href></a> <span class="bu">print</span>(sp_r)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1516
1517
<div class="cell-output cell-output-stdout">
1517
1518
<pre><code>5 x 4 sparse Matrix of class "dgCMatrix"
1518
-
1519
- [1,] 0.6952559 0.2978000 0.41267065 0.2587180
1520
- [2,] 0.8831325 0.7947112 0.53691120 0.2475283
1521
- [3,] 0.6881223 0.2426546 0.51419239 0.8002923
1522
- [4,] 0.4321894 0.3744108 0.05505875 0.2359973
1523
- [5,] 0.5823694 0.3485965 0.14651556 0.2437071
1519
+
1520
+ [1,] 0.73294749 0.55953375 0.6994413 0.5274408
1521
+ [2,] 0.09756794 0.39535684 0.8066980 0.1054061
1522
+ [3,] 0.35662206 0.70148737 0.1200273 0.2802668
1523
+ [4,] 0.19947608 0.84421019 0.8270219 0.8253163
1524
+ [5,] 0.56938249 0.04640811 0.3417868 0.3285883
1524
1525
</code></pre>
1525
1526
</div>
1526
1527
</div>
@@ -1641,10 +1642,33 @@ <h1>Reticulate scanpy</h1>
1641
1642
<span id="cb20-14"><a href></a><span class="co"># obsp: 'connectivities', 'distances'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1642
1643
</section>
1643
1644
1644
- <section>
1645
1645
<section id="disk-based-interoperability" class="title-slide slide level1 center">
1646
1646
<h1>Disk-based interoperability</h1>
1647
+ <p>Disk-based interoperability is a strategy for achieving interoperability between tools written in different programming languages by <strong>storing intermediate results in standardized, language-agnostic file formats</strong>.</p>
1648
+ <ul>
1649
+ <li>Upside:
1650
+ <ul>
1651
+ <li>Simple, just add reading and witing lines</li>
1652
+ <li>Modular scripts</li>
1653
+ </ul></li>
1654
+ <li>Downside:
1655
+ <ul>
1656
+ <li>increased disk usage</li>
1657
+ <li>less direct interaction, debugging…</li>
1658
+ </ul></li>
1659
+ </ul>
1660
+ </section>
1647
1661
1662
+ <section>
1663
+ <section id="important-features-of-interoperable-file-formats" class="title-slide slide level1 center">
1664
+ <h1>Important features of interoperable file formats</h1>
1665
+ <ul>
1666
+ <li>Compression</li>
1667
+ <li>Sparse matrix support</li>
1668
+ <li>Large images</li>
1669
+ <li>Lazy chunk loading</li>
1670
+ <li>Remote storage</li>
1671
+ </ul>
1648
1672
</section>
1649
1673
<section id="general-single-cell-file-formats-of-interest-for-python-and-r" class="slide level2">
1650
1674
<h2>General single cell file formats of interest for Python and R</h2>
@@ -1871,9 +1895,69 @@ <h2>Specialized single cell file formats of interest for Python and R</h2>
1871
1895
</tbody>
1872
1896
</table>
1873
1897
</section></section>
1898
+ <section>
1899
+ <section id="disk-based-pipelines" class="title-slide slide level1 center">
1900
+ <h1>Disk-based pipelines</h1>
1901
+ <p>Script pipeline:</p>
1902
+ <div class="sourceCode" id="cb21"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb21-1"><a href></a><span class="co">#!/bin/bash</span></span>
1903
+ <span id="cb21-2"><a href></a></span>
1904
+ <span id="cb21-3"><a href></a><span class="fu">bash</span> scripts/1_load_data.sh</span>
1905
+ <span id="cb21-4"><a href></a><span class="ex">python</span> scripts/2_compute_pseudobulk.py</span>
1906
+ <span id="cb21-5"><a href></a><span class="ex">Rscript</span> scripts/3_analysis_de.R</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1907
+ <p>Notebook pipeline:</p>
1908
+ <div class="sourceCode" id="cb22"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb22-1"><a href></a><span class="co"># Every step can be a new notebook execution with inspectable output</span></span>
1909
+ <span id="cb22-2"><a href></a><span class="ex">jupyter</span> nbconvert <span class="at">--to</span> notebook <span class="at">--execute</span> my_notebook.ipynb <span class="at">--allow-errors</span> <span class="at">--output-dir</span> outputs/</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1910
+ </section>
1911
+ <section id="just-stay-in-your-language-and-call-scripts" class="slide level2">
1912
+ <h2>Just stay in your language and call scripts</h2>
1913
+ <div class="sourceCode" id="cb23"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href></a><span class="im">import</span> subprocess</span>
1914
+ <span id="cb23-2"><a href></a></span>
1915
+ <span id="cb23-3"><a href></a>subprocess.run(<span class="st">"bash scripts/1_load_data.sh"</span>, shell<span class="op">=</span><span class="va">True</span>)</span>
1916
+ <span id="cb23-4"><a href></a><span class="co"># Alternatively you can run Python code here instead of calling a Python script</span></span>
1917
+ <span id="cb23-5"><a href></a>subprocess.run(<span class="st">"python scripts/2_compute_pseudobulk.py"</span>, shell<span class="op">=</span><span class="va">True</span>)</span>
1918
+ <span id="cb23-6"><a href></a>subprocess.run(<span class="st">"Rscript scripts/3_analysis_de.R"</span>, shell<span class="op">=</span><span class="va">True</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1919
+ </section></section>
1920
+ <section>
1921
+ <section id="pipelines-with-different-environments" class="title-slide slide level1 center">
1922
+ <h1>Pipelines with different environments</h1>
1923
+ <ol type="1">
1924
+ <li>interleave with environment (de)activation functions</li>
1925
+ <li>use rvenv</li>
1926
+ <li>use Pixi</li>
1927
+ </ol>
1928
+ </section>
1929
+ <section id="pixi-to-manage-different-environments" class="slide level2">
1930
+ <h2>Pixi to manage different environments</h2>
1931
+ <div class="sourceCode" id="cb24"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb24-1"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> bash scripts/1_load_data.sh</span>
1932
+ <span id="cb24-2"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> scverse scripts/2_compute_pseudobulk.py</span>
1933
+ <span id="cb24-3"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> rverse scripts/3_analysis_de.R</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1934
+ </section>
1935
+ <section id="define-tasks-in-pixi" class="slide level2">
1936
+ <h2>Define tasks in Pixi</h2>
1937
+ <div class="sourceCode" id="cb25"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb25-1"><a href></a><span class="ex">...</span></span>
1938
+ <span id="cb25-2"><a href></a><span class="ex">[feature.bash.tasks]</span></span>
1939
+ <span id="cb25-3"><a href></a><span class="ex">load_data</span> = <span class="st">"bash book/disk_based/scripts/1_load_data.sh"</span></span>
1940
+ <span id="cb25-4"><a href></a><span class="ex">...</span></span>
1941
+ <span id="cb25-5"><a href></a><span class="ex">[feature.scverse.tasks]</span></span>
1942
+ <span id="cb25-6"><a href></a><span class="ex">compute_pseudobulk</span> = <span class="st">"python book/disk_based/scripts/2_compute_pseudobulk.py"</span></span>
1943
+ <span id="cb25-7"><a href></a><span class="ex">...</span></span>
1944
+ <span id="cb25-8"><a href></a><span class="ex">[feature.rverse.tasks]</span></span>
1945
+ <span id="cb25-9"><a href></a><span class="ex">analysis_de</span> = <span class="st">"Rscript --no-init-file book/disk_based/scripts/3_analysis_de.R"</span></span>
1946
+ <span id="cb25-10"><a href></a><span class="ex">...</span></span>
1947
+ <span id="cb25-11"><a href></a><span class="ex">[tasks]</span></span>
1948
+ <span id="cb25-12"><a href></a><span class="ex">pipeline</span> = { depends-on = [<span class="st">"load_data"</span>, <span class="st">"compute_pseudobulk"</span>, <span class="st">"analysis_de"</span>] }</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1949
+ <div class="sourceCode" id="cb26"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb26-1"><a href></a><span class="ex">pixi</span> run pipeline</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1950
+ </section>
1951
+ <section id="also-possible-to-use-containers" class="slide level2">
1952
+ <h2>Also possible to use containers</h2>
1953
+ <div class="sourceCode" id="cb27"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb27-1"><a href></a><span class="ex">docker</span> pull berombau/polygloty-docker:latest</span>
1954
+ <span id="cb27-2"><a href></a><span class="ex">docker</span> run <span class="at">-it</span> <span class="at">-v</span> <span class="va">$(</span><span class="bu">pwd</span><span class="va">)</span>/usecase:/app/usecase <span class="at">-v</span> <span class="va">$(</span><span class="bu">pwd</span><span class="va">)</span>/book:/app/book berombau/polygloty-docker:latest pixi run pipeline</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1955
+ <p>Another approach is to use multi-package containers to create custom combinations of packages. - <a href="https://midnighter.github.io/mulled/">Multi-Package BioContainers</a> - <a href="https://seqera.io/containers/">Seqera Containers</a></p>
1956
+ </section></section>
1874
1957
<section id="workflows" class="title-slide slide level1 center">
1875
1958
<h1>Workflows</h1>
1876
-
1959
+ <p>You can go a long way with a folder of notebooks or scripts and the right tools. But as your project grows more bespoke, it can be worth the effort to use a <strong><a href="../workflow_frameworks">workflow framework</a></strong> like Viash, Nextflow or Snakemake to manage the pipeline for you.</p>
1960
+ <p>See https://saeyslab.github.io/polygloty/book/workflow_frameworks/</p>
1877
1961
</section>
1878
1962
1879
1963
<section id="takeaways" class="title-slide slide level1 center">
0 commit comments