Skip to content

Commit

Permalink
Add figure captions.
Browse files Browse the repository at this point in the history
  • Loading branch information
Adibvafa committed Sep 15, 2024
1 parent 2313a15 commit 3e913c1
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 9 deletions.
32 changes: 23 additions & 9 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ <h2 class="title is-3">Abstract</h2>
<div class="container is-max-desktop">
<h2 class="title is-3">Home</h2>
<div class="content is-size-5 has-text-justified">
<p> Welcome to the CodonTransformer project page! </p>
<p> Welcome to the CodonTransformer project page!</p>
<p>
<strong>CodonTransformer</strong> is a cutting-edge, multispecies deep learning model designed for state-of-the-art codon optimization. Trained on over 1 million gene-protein pairs from 164 organisms spanning all kingdoms of life, CodonTransformer leverages advanced neural network architectures to generate host-specific DNA sequences with natural-like codon usage patterns and minimized negative cis-regulatory elements for any protein sequence. </p>
</div>
Expand All @@ -212,7 +212,11 @@ <h2 class="title is-3">Overview</h2>
</div>
<div class="columns is-centered" style="margin-top:15px;text-align:center;">
<div class="column is-nine-tenth">
<img src="static/images/model.png" alt="Data" style="width:65%;">
<img src="static/images/codon.png" alt="Data" style="width:95%;">
<div class="image-caption" style="margin-top: 5pt; text-align: left;">
<p><strong>Fig. 1: CodonTransformer multispecies model with combined organism-amino acid-codon embedding.</strong></p>
<p> <strong>a.</strong> An encoder-only BigBird Transformer model trained by combined amino acid-codon tokens along with organism encoding for host-specific codon usage representation. <strong>b.</strong> CodonTransformer was trained with ~1 million genes from 164 organisms across all kingdoms of life and fine-tuned with highly expressed genes (top 10% codon usage index, CSI) of 13 organisms and two chloroplast genomes.</p>
</div>
</div>
</div>
</div>
Expand Down Expand Up @@ -266,6 +270,11 @@ <h2 class="title is-3">Learning Codon Patterns Across Organisms</h2>
<img src="static/images/Fig. 2.jpg" alt="Data" style="width:95%;">
</div>
</div>
<div class="image-caption" style="margin-top: -10pt; text-align: left;">
<p><strong>Fig. 2: CodonTransformer learned codon patterns across organisms.</strong></p>
<p>Codon usage index (CSI) for all and the top 10% CSI original genes and generated DNA sequences for all original proteins by CodonTransformer (base and fine-tuned models) for 9 out of 15 genomes used for fine-tuning in this study. See Supplementary Figs. 2-16 for all 15 genomes and additional metrics of GC content codon and distribution frequency (CDF). Source data for Fig. 2 and Supplementary Figs. 2-16 is available at <a href="https://zenodo.org/records/13262517" target="_blank">https://zenodo.org/records/13262517</a>.</p>

</div>
</div>
</section>

Expand All @@ -282,7 +291,10 @@ <h2 class="title is-3">Generating Natural-Like Codon Distributions</h2>
<img src="static/images/Fig. 3.jpg" alt="Data" style="width:95%;">
</div>
</div>
</div>
<div class="image-caption" style="margin-top: -10pt; text-align: left;">
<p><strong>Fig. 3: CodonTransformer generates natural-like codon distributions.</strong></p>
<p> <strong>a.</strong> Schematic representation of %MinMax and dynamic time warping (DTW). %Minmax represents the proportion of common and rare codons in a sliding window of 18 codons. DTW algorithm computes the minimal distance between two %MinMax profiles by finding the matching positions (Methods). <strong>b.</strong> %MinMax profiles for sequences generated by different models for genes yahG (E. coli), SER33 (S. cerevisiae), AT4G12540 (A. ²thaliana), Csad (M. musculus), ZBTB7C (H. sapiens). <strong>c.</strong> DTW distances between %MinMax profiles of model-generated sequences and their genomic counterparts for 50 random genes selected among the top 10% codon similarity index (CSI). For each organism, the gene for which the %MinMax profiles are represented above (b) is highlighted in grey. <strong>d.</strong> Mean and standard deviation of normalized DTW distances by sequence length between sequences for the 5 organisms (for organism-specific DTW distances, see Supplementary Figs. 17). Data underlying this figure is provided in Supplementary Data 1.</p>
</div>
</section>

<section class="section" id="Results4">
Expand All @@ -298,20 +310,23 @@ <h2 class="title is-3">Benchmarking with Real World Proteins</h2>
<img src="static/images/Fig. 4.jpg" alt="Data" style="width:95%;">
</div>
</div>
<div class="image-caption" style="margin-top: -10pt; text-align: left;">
<p><strong>Fig. 4: Model benchmark with proteins of biotechnological interest.</strong></p>
<p> Mean and standard deviation of Jaccard index <strong>(a)</strong>, sequence similarity <strong>(b)</strong>, and dynamic time warping <strong>(c)</strong> distance between corresponding sequences for the 52 benchmark proteins across the 5 organisms (for organism-specific results, see Supplementary Figs. 19, 20, and 21, respectively). <strong>(d)</strong>, Number of negative cis-elements in the sequences generated by different tools (✕ shows the mean). Data underlying this figure is provided in Supplementary Data 2.</p>
</div>
</div>
</section>

<section class="section" id="getting-started">
<div class="container is-max-desktop">
<h2 class="title is-3">Getting Started</h2>


<h3 class="title is-4" style="margin-top: 60px;">Installation</h3>
<div class="content is-size-5">
<p>Install CodonTransformer via pip:</p>
<pre><code class="language-sh">pip install CodonTransformer</code></pre>
<pre><code class="language-sh" style="font-size: 0.9em; line-height: 1;">pip install CodonTransformer</code></pre>
<p>Or clone the repository:</p>
<pre><code class="language-sh">git clone https://github.com/adibvafa/CodonTransformer.git
<pre><code class="language-sh" style="font-size: 0.9em; line-height: 1;">git clone https://github.com/adibvafa/CodonTransformer.git
cd CodonTransformer
pip install -r requirements.txt</code></pre>
<p>The package requires <code>python>=3.9</code>. The requirements are <a href="https://github.com/Adibvafa/CodonTransformer/blob/main/requirements.txt" target="_blank">available here</a>.</p>
Expand All @@ -320,7 +335,7 @@ <h3 class="title is-4" style="margin-top: 60px;">Installation</h3>
<h3 class="title is-4" style="margin-top: 60px;">Use Case</h3>
<div class="content is-size-5">
<p>After installing CodonTransformer, you can use:</p>
<pre><code class="language-python">import torch
<pre><code class="language-python" style="font-size: 0.85em; line-height: 1.2;">import torch
from transformers import AutoTokenizer, BigBirdForMaskedLM
from CodonTransformer.CodonPrediction import predict_dna_sequence
from CodonTransformer.CodonJupyter import format_model_output
Expand All @@ -346,7 +361,7 @@ <h3 class="title is-4" style="margin-top: 60px;">Use Case</h3>
)
print(format_model_output(output))</code></pre>

<pre><code class="language-python">-----------------------------
<pre><code class="language-python" style="font-size: 0.85em; line-height: 1.2;">-----------------------------
| Organism |
-----------------------------
Escherichia coli general
Expand All @@ -365,7 +380,6 @@ <h3 class="title is-4" style="margin-top: 60px;">Use Case</h3>
| Predicted DNA |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA</code></pre>
<p>The output will show the organism, input protein, processed input, and predicted DNA sequence.</p>
</div>

<h3 class="title is-4" style="margin-top: 60px;">Key Features</h3>
Expand Down
Binary file added static/images/codon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3e913c1

Please sign in to comment.