ParallelSubSearch is a Python project designed to illustrate and explore the algorithmic design and parallelization of functions to find the longest contiguous sub-sequence within a sequence. This project includes a performance comparison between serial and parallel execution approaches.
The project consists of two main parts:
- Algorithmic Design: Implementing functions
longest_contiguous
andall_longest
to find specific types of sub-sequences within provided data. - Parallel Execution: Enhancing performance by implementing a parallel approach to process large datasets using Python's
concurrent.futures
module.
- Python 3.7 or higher
- pytest for running unit tests
- Clone the repository to your local machine:
git clone https://github.com/sapirmardan/ParallelSubSearch.git
- Install the required packages:
pip install numpy biopython pytest
- To run the tests and validate functionality:
python3 -m pytest --doctest-modules -v
- To execute the program and compare the serial vs parallel performance:
python3 longest_sub.py
longest_contiguous_version1(seq, stop_codon="", gap="", case_sensitive=True)
: Calculates the longest contiguous sub-sequence for each unique letter in the sequence. It is case-sensitive by default.longest_contiguous_version2(seq, stop_codon="", gap="", case_sensitive=True)
: Similar to version 1 but optimized for performance using NumPy for large sequences.all_longest
: This function finds all sub-sequences which are longest by length and not necessarily contiguous.run_large_test_serial
: Executes the data processing in a serial manner.run_test_parallel
: Executes the data processing in parallel usingconcurrent.futures
to improve performance.
longest_sub.py
: Contains the main logic for sub-sequence identification.test_longest_sub.py
: Contains unit tests for the functions implemented inlongest_sub.py
.
Here is a quick example of how to use longest_contiguous
:
from longest_sub import longest_contiguous_version1
sequence = Seq("AACCGGTTAACCGGTT")
result = longest_contiguous_version1(sequence)
print(result)