Skip to content

Commit 3e8429b

Browse files
committed
add slides
1 parent 2b52247 commit 3e8429b

File tree

4 files changed

+113
-29
lines changed

4 files changed

+113
-29
lines changed

assets/slides.pdf

616 KB
Binary file not shown.

slides/slides.html

+106-22
Original file line numberDiff line numberDiff line change
@@ -1262,7 +1262,6 @@ <h1 class="title">Polyglot programming for single-cell analysis</h1>
12621262

12631263
<p class="date">2024-09-12</p>
12641264
</section>
1265-
<section>
12661265
<section id="introduction" class="title-slide slide level1 center">
12671266
<h1>Introduction</h1>
12681267
<ol type="1">
@@ -1271,14 +1270,16 @@ <h1>Introduction</h1>
12711270
</ol>
12721271
<p>We will be focusing on R &amp; Python</p>
12731272
</section>
1274-
<section id="summary" class="slide level2">
1275-
<h2>Summary</h2>
1273+
1274+
<section id="summary" class="title-slide slide level1 center">
1275+
<h1>Summary</h1>
12761276
<p><strong>Interoperability</strong> between languages allows analysts to take advantage of the strengths of different ecosystems</p>
12771277
<p><strong>On-disk</strong> interoperability uses standard file formats to transfer data and is typically more reliable</p>
12781278
<p><strong>In-memory</strong> interoperability transfers data directly between parallel sessions and is convenient for interactive analysis</p>
12791279
<p>While interoperability is currently possible developers continue to improve the experience</p>
12801280
<p><a href="https://www.sc-best-practices.org/introduction/interoperability.html">Single-cell best practices: Interoperability</a></p>
1281-
</section></section>
1281+
</section>
1282+
12821283
<section id="how-do-you-interact-with-a-package-in-another-language" class="title-slide slide level1 center">
12831284
<h1>How do you interact with a package in another language?</h1>
12841285
<ol type="1">
@@ -1410,7 +1411,7 @@ <h1>Rpy2: basics</h1>
14101411
<li><code>rpy2.robjects</code>, the high-level interface</li>
14111412
</ul></li>
14121413
</ul>
1413-
<div id="ea68de58" class="cell" data-execution_count="1">
1414+
<div id="d6998f0a" class="cell" data-execution_count="1">
14141415
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href></a><span class="im">import</span> rpy2</span>
14151416
<span id="cb4-2"><a href></a><span class="im">import</span> rpy2.robjects <span class="im">as</span> robjects</span>
14161417
<span id="cb4-3"><a href></a></span>
@@ -1437,7 +1438,7 @@ <h1>Rpy2: basics</h1>
14371438

14381439
<section id="rpy2-basics-1" class="title-slide slide level1 center">
14391440
<h1>Rpy2: basics</h1>
1440-
<div id="5572dd32" class="cell" data-execution_count="2">
1441+
<div id="f6fb7846" class="cell" data-execution_count="2">
14411442
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href></a>str_vector <span class="op">=</span> robjects.StrVector([<span class="st">&#39;abc&#39;</span>, <span class="st">&#39;def&#39;</span>, <span class="st">&#39;ghi&#39;</span>])</span>
14421443
<span id="cb5-2"><a href></a>flt_vector <span class="op">=</span> robjects.FloatVector([<span class="fl">0.3</span>, <span class="fl">0.8</span>, <span class="fl">0.7</span>])</span>
14431444
<span id="cb5-3"><a href></a>int_vector <span class="op">=</span> robjects.IntVector([<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>])</span>
@@ -1457,7 +1458,7 @@ <h1>Rpy2: basics</h1>
14571458

14581459
<section id="rpy2-numpy" class="title-slide slide level1 center">
14591460
<h1>Rpy2: numpy</h1>
1460-
<div id="5a5d076d" class="cell" data-execution_count="3">
1461+
<div id="84dfd14d" class="cell" data-execution_count="3">
14611462
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
14621463
<span id="cb7-2"><a href></a></span>
14631464
<span id="cb7-3"><a href></a><span class="im">from</span> rpy2.robjects <span class="im">import</span> numpy2ri</span>
@@ -1469,18 +1470,18 @@ <h1>Rpy2: numpy</h1>
14691470
<span id="cb7-9"><a href></a> mtx <span class="op">=</span> robjects.r.matrix(rd_m, nrow <span class="op">=</span> <span class="dv">5</span>)</span>
14701471
<span id="cb7-10"><a href></a> <span class="bu">print</span>(mtx)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
14711472
<div class="cell-output cell-output-stdout">
1472-
<pre><code>[[0.69525594 0.29780005 0.41267065 0.25871805]
1473-
[0.88313251 0.79471121 0.5369112 0.24752835]
1474-
[0.68812232 0.24265455 0.51419239 0.80029227]
1475-
[0.43218943 0.37441082 0.05505875 0.23599726]
1476-
[0.58236939 0.34859652 0.14651556 0.24370712]]</code></pre>
1473+
<pre><code>[[0.73294749 0.55953375 0.69944132 0.52744075]
1474+
[0.09756794 0.39535684 0.80669803 0.10540606]
1475+
[0.35662206 0.70148737 0.12002733 0.28026677]
1476+
[0.19947608 0.84421019 0.82702188 0.82531633]
1477+
[0.56938249 0.04640811 0.34178679 0.3285883 ]]</code></pre>
14771478
</div>
14781479
</div>
14791480
</section>
14801481

14811482
<section id="rpy2-pandas" class="title-slide slide level1 center">
14821483
<h1>Rpy2: pandas</h1>
1483-
<div id="477fe152" class="cell" data-execution_count="4">
1484+
<div id="f47e193f" class="cell" data-execution_count="4">
14841485
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
14851486
<span id="cb9-2"><a href></a></span>
14861487
<span id="cb9-3"><a href></a><span class="im">from</span> rpy2.robjects <span class="im">import</span> pandas2ri</span>
@@ -1503,7 +1504,7 @@ <h1>Rpy2: pandas</h1>
15031504

15041505
<section id="rpy2-sparse-matrices" class="title-slide slide level1 center">
15051506
<h1>Rpy2: sparse matrices</h1>
1506-
<div id="7513f866" class="cell" data-execution_count="5">
1507+
<div id="fd0cc8dd" class="cell" data-execution_count="5">
15071508
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href></a><span class="im">import</span> scipy <span class="im">as</span> sp</span>
15081509
<span id="cb11-2"><a href></a></span>
15091510
<span id="cb11-3"><a href></a><span class="im">from</span> anndata2ri <span class="im">import</span> scipy2ri</span>
@@ -1515,12 +1516,12 @@ <h1>Rpy2: sparse matrices</h1>
15151516
<span id="cb11-9"><a href></a> <span class="bu">print</span>(sp_r)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
15161517
<div class="cell-output cell-output-stdout">
15171518
<pre><code>5 x 4 sparse Matrix of class &quot;dgCMatrix&quot;
1518-
1519-
[1,] 0.6952559 0.2978000 0.41267065 0.2587180
1520-
[2,] 0.8831325 0.7947112 0.53691120 0.2475283
1521-
[3,] 0.6881223 0.2426546 0.51419239 0.8002923
1522-
[4,] 0.4321894 0.3744108 0.05505875 0.2359973
1523-
[5,] 0.5823694 0.3485965 0.14651556 0.2437071
1519+
1520+
[1,] 0.73294749 0.55953375 0.6994413 0.5274408
1521+
[2,] 0.09756794 0.39535684 0.8066980 0.1054061
1522+
[3,] 0.35662206 0.70148737 0.1200273 0.2802668
1523+
[4,] 0.19947608 0.84421019 0.8270219 0.8253163
1524+
[5,] 0.56938249 0.04640811 0.3417868 0.3285883
15241525
</code></pre>
15251526
</div>
15261527
</div>
@@ -1641,10 +1642,33 @@ <h1>Reticulate scanpy</h1>
16411642
<span id="cb20-14"><a href></a><span class="co"># obsp: &#39;connectivities&#39;, &#39;distances&#39;</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
16421643
</section>
16431644

1644-
<section>
16451645
<section id="disk-based-interoperability" class="title-slide slide level1 center">
16461646
<h1>Disk-based interoperability</h1>
1647+
<p>Disk-based interoperability is a strategy for achieving interoperability between tools written in different programming languages by <strong>storing intermediate results in standardized, language-agnostic file formats</strong>.</p>
1648+
<ul>
1649+
<li>Upside:
1650+
<ul>
1651+
<li>Simple, just add reading and witing lines</li>
1652+
<li>Modular scripts</li>
1653+
</ul></li>
1654+
<li>Downside:
1655+
<ul>
1656+
<li>increased disk usage</li>
1657+
<li>less direct interaction, debugging…</li>
1658+
</ul></li>
1659+
</ul>
1660+
</section>
16471661

1662+
<section>
1663+
<section id="important-features-of-interoperable-file-formats" class="title-slide slide level1 center">
1664+
<h1>Important features of interoperable file formats</h1>
1665+
<ul>
1666+
<li>Compression</li>
1667+
<li>Sparse matrix support</li>
1668+
<li>Large images</li>
1669+
<li>Lazy chunk loading</li>
1670+
<li>Remote storage</li>
1671+
</ul>
16481672
</section>
16491673
<section id="general-single-cell-file-formats-of-interest-for-python-and-r" class="slide level2">
16501674
<h2>General single cell file formats of interest for Python and R</h2>
@@ -1871,9 +1895,69 @@ <h2>Specialized single cell file formats of interest for Python and R</h2>
18711895
</tbody>
18721896
</table>
18731897
</section></section>
1898+
<section>
1899+
<section id="disk-based-pipelines" class="title-slide slide level1 center">
1900+
<h1>Disk-based pipelines</h1>
1901+
<p>Script pipeline:</p>
1902+
<div class="sourceCode" id="cb21"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb21-1"><a href></a><span class="co">#!/bin/bash</span></span>
1903+
<span id="cb21-2"><a href></a></span>
1904+
<span id="cb21-3"><a href></a><span class="fu">bash</span> scripts/1_load_data.sh</span>
1905+
<span id="cb21-4"><a href></a><span class="ex">python</span> scripts/2_compute_pseudobulk.py</span>
1906+
<span id="cb21-5"><a href></a><span class="ex">Rscript</span> scripts/3_analysis_de.R</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1907+
<p>Notebook pipeline:</p>
1908+
<div class="sourceCode" id="cb22"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb22-1"><a href></a><span class="co"># Every step can be a new notebook execution with inspectable output</span></span>
1909+
<span id="cb22-2"><a href></a><span class="ex">jupyter</span> nbconvert <span class="at">--to</span> notebook <span class="at">--execute</span> my_notebook.ipynb <span class="at">--allow-errors</span> <span class="at">--output-dir</span> outputs/</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1910+
</section>
1911+
<section id="just-stay-in-your-language-and-call-scripts" class="slide level2">
1912+
<h2>Just stay in your language and call scripts</h2>
1913+
<div class="sourceCode" id="cb23"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href></a><span class="im">import</span> subprocess</span>
1914+
<span id="cb23-2"><a href></a></span>
1915+
<span id="cb23-3"><a href></a>subprocess.run(<span class="st">&quot;bash scripts/1_load_data.sh&quot;</span>, shell<span class="op">=</span><span class="va">True</span>)</span>
1916+
<span id="cb23-4"><a href></a><span class="co"># Alternatively you can run Python code here instead of calling a Python script</span></span>
1917+
<span id="cb23-5"><a href></a>subprocess.run(<span class="st">&quot;python scripts/2_compute_pseudobulk.py&quot;</span>, shell<span class="op">=</span><span class="va">True</span>)</span>
1918+
<span id="cb23-6"><a href></a>subprocess.run(<span class="st">&quot;Rscript scripts/3_analysis_de.R&quot;</span>, shell<span class="op">=</span><span class="va">True</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1919+
</section></section>
1920+
<section>
1921+
<section id="pipelines-with-different-environments" class="title-slide slide level1 center">
1922+
<h1>Pipelines with different environments</h1>
1923+
<ol type="1">
1924+
<li>interleave with environment (de)activation functions</li>
1925+
<li>use rvenv</li>
1926+
<li>use Pixi</li>
1927+
</ol>
1928+
</section>
1929+
<section id="pixi-to-manage-different-environments" class="slide level2">
1930+
<h2>Pixi to manage different environments</h2>
1931+
<div class="sourceCode" id="cb24"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb24-1"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> bash scripts/1_load_data.sh</span>
1932+
<span id="cb24-2"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> scverse scripts/2_compute_pseudobulk.py</span>
1933+
<span id="cb24-3"><a href></a><span class="ex">pixi</span> run <span class="at">-e</span> rverse scripts/3_analysis_de.R</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1934+
</section>
1935+
<section id="define-tasks-in-pixi" class="slide level2">
1936+
<h2>Define tasks in Pixi</h2>
1937+
<div class="sourceCode" id="cb25"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb25-1"><a href></a><span class="ex">...</span></span>
1938+
<span id="cb25-2"><a href></a><span class="ex">[feature.bash.tasks]</span></span>
1939+
<span id="cb25-3"><a href></a><span class="ex">load_data</span> = <span class="st">&quot;bash book/disk_based/scripts/1_load_data.sh&quot;</span></span>
1940+
<span id="cb25-4"><a href></a><span class="ex">...</span></span>
1941+
<span id="cb25-5"><a href></a><span class="ex">[feature.scverse.tasks]</span></span>
1942+
<span id="cb25-6"><a href></a><span class="ex">compute_pseudobulk</span> = <span class="st">&quot;python book/disk_based/scripts/2_compute_pseudobulk.py&quot;</span></span>
1943+
<span id="cb25-7"><a href></a><span class="ex">...</span></span>
1944+
<span id="cb25-8"><a href></a><span class="ex">[feature.rverse.tasks]</span></span>
1945+
<span id="cb25-9"><a href></a><span class="ex">analysis_de</span> = <span class="st">&quot;Rscript --no-init-file book/disk_based/scripts/3_analysis_de.R&quot;</span></span>
1946+
<span id="cb25-10"><a href></a><span class="ex">...</span></span>
1947+
<span id="cb25-11"><a href></a><span class="ex">[tasks]</span></span>
1948+
<span id="cb25-12"><a href></a><span class="ex">pipeline</span> = { depends-on = [<span class="st">&quot;load_data&quot;</span>, <span class="st">&quot;compute_pseudobulk&quot;</span>, <span class="st">&quot;analysis_de&quot;</span>] }</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1949+
<div class="sourceCode" id="cb26"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb26-1"><a href></a><span class="ex">pixi</span> run pipeline</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1950+
</section>
1951+
<section id="also-possible-to-use-containers" class="slide level2">
1952+
<h2>Also possible to use containers</h2>
1953+
<div class="sourceCode" id="cb27"><pre class="sourceCode numberSource bash number-lines code-with-copy"><code class="sourceCode bash"><span id="cb27-1"><a href></a><span class="ex">docker</span> pull berombau/polygloty-docker:latest</span>
1954+
<span id="cb27-2"><a href></a><span class="ex">docker</span> run <span class="at">-it</span> <span class="at">-v</span> <span class="va">$(</span><span class="bu">pwd</span><span class="va">)</span>/usecase:/app/usecase <span class="at">-v</span> <span class="va">$(</span><span class="bu">pwd</span><span class="va">)</span>/book:/app/book berombau/polygloty-docker:latest pixi run pipeline</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
1955+
<p>Another approach is to use multi-package containers to create custom combinations of packages. - <a href="https://midnighter.github.io/mulled/">Multi-Package BioContainers</a> - <a href="https://seqera.io/containers/">Seqera Containers</a></p>
1956+
</section></section>
18741957
<section id="workflows" class="title-slide slide level1 center">
18751958
<h1>Workflows</h1>
1876-
1959+
<p>You can go a long way with a folder of notebooks or scripts and the right tools. But as your project grows more bespoke, it can be worth the effort to use a <strong><a href="../workflow_frameworks">workflow framework</a></strong> like Viash, Nextflow or Snakemake to manage the pipeline for you.</p>
1960+
<p>See https://saeyslab.github.io/polygloty/book/workflow_frameworks/</p>
18771961
</section>
18781962

18791963
<section id="takeaways" class="title-slide slide level1 center">

slides/slides.pdf

349 KB
Binary file not shown.

slides/slides.qmd

+7-7
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ execute:
2929

3030
We will be focusing on R & Python
3131

32-
## Summary
32+
# Summary
3333

3434
**Interoperability** between languages allows analysts to take advantage of the strengths of different ecosystems
3535

@@ -342,13 +342,13 @@ adata
342342

343343
Disk-based interoperability is a strategy for achieving interoperability between tools written in different programming languages by **storing intermediate results in standardized, language-agnostic file formats**.
344344

345-
Upside:
346-
- Simple, just add reading and witing lines
347-
- Modular scripts
345+
- Upside:
346+
- Simple, just add reading and witing lines
347+
- Modular scripts
348348

349-
Downside:
350-
- increased disk usage
351-
- less direct interaction, debugging...
349+
- Downside:
350+
- increased disk usage
351+
- less direct interaction, debugging...
352352

353353
# Important features of interoperable file formats
354354

0 commit comments

Comments
 (0)