From b2fed73f7c35148bbd79bbd4bff2e3a917f86ee5 Mon Sep 17 00:00:00 2001
From: Adibvafa Fallahpour <90617686+Adibvafa@users.noreply.github.com>
Date: Thu, 26 Sep 2024 19:05:01 -0400
Subject: [PATCH] Update README.md
---
README.md | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/README.md b/README.md
index fc8ce0d..ec56f5c 100644
--- a/README.md
+++ b/README.md
@@ -177,6 +177,29 @@ To finetune CodonTransformer on your own data, follow these steps:
For an example of a SLURM job request, see the `slurm` directory in the repository.
+## Handling Ambiguous Amino Acids
+
+CodonTransformer provides a flexible system for handling ambiguous amino acids through the `ProteinConfig` class. By default, CodonUtils includes a [predefined mapping for ambiguous amino acids](https://github.com/Adibvafa/CodonTransformer/blob/main/CodonTransformer/CodonUtils.py#L45), but users can customize this behavior:
+
+```python
+from CodonTransformer.CodonUtils import ProteinConfig
+
+# Configure protein preprocessing
+config = ProteinConfig()
+config.set('ambiguous_aminoacid_behavior', 'standardize_random')
+config.set('ambiguous_aminoacid_map_override', {'X': ['A', 'G', 'S']})
+
+# Run CodonTransformer
+...
+```
+
+Options for `ambiguous_aminoacid_behavior`:
+- `standardize_random` (default): Randomly selects a random amino acid from the mapping list.
+- `standardize_deterministic`: Selects the first amino acid from the mapping list.
+- `raise_error`: Treats ambiguous amino acids as invalid.
+
+Users can override the default mapping with `ambiguous_aminoacid_map_override`.
+
## Key Features
- **CodonData**