From b2fed73f7c35148bbd79bbd4bff2e3a917f86ee5 Mon Sep 17 00:00:00 2001 From: Adibvafa Fallahpour <90617686+Adibvafa@users.noreply.github.com> Date: Thu, 26 Sep 2024 19:05:01 -0400 Subject: [PATCH] Update README.md --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index fc8ce0d..ec56f5c 100644 --- a/README.md +++ b/README.md @@ -177,6 +177,29 @@ To finetune CodonTransformer on your own data, follow these steps: For an example of a SLURM job request, see the `slurm` directory in the repository.

+## Handling Ambiguous Amino Acids + +CodonTransformer provides a flexible system for handling ambiguous amino acids through the `ProteinConfig` class. By default, CodonUtils includes a [predefined mapping for ambiguous amino acids](https://github.com/Adibvafa/CodonTransformer/blob/main/CodonTransformer/CodonUtils.py#L45), but users can customize this behavior: + +```python +from CodonTransformer.CodonUtils import ProteinConfig + +# Configure protein preprocessing +config = ProteinConfig() +config.set('ambiguous_aminoacid_behavior', 'standardize_random') +config.set('ambiguous_aminoacid_map_override', {'X': ['A', 'G', 'S']}) + +# Run CodonTransformer +... +``` + +Options for `ambiguous_aminoacid_behavior`: +- `standardize_random` (default): Randomly selects a random amino acid from the mapping list. +- `standardize_deterministic`: Selects the first amino acid from the mapping list. +- `raise_error`: Treats ambiguous amino acids as invalid. + +Users can override the default mapping with `ambiguous_aminoacid_map_override`. +

## Key Features - **CodonData**