ParallelSubSearch

ParallelSubSearch is a Python project designed to illustrate and explore the algorithmic design and parallelization of functions to find the longest contiguous sub-sequence within a sequence. This project includes a performance comparison between serial and parallel execution approaches.

Description

The project consists of two main parts:

Algorithmic Design: Implementing functions longest_contiguous and all_longest to find specific types of sub-sequences within provided data.
Parallel Execution: Enhancing performance by implementing a parallel approach to process large datasets using Python's concurrent.futures module.

Getting Started

Dependencies

Python 3.7 or higher
pytest for running unit tests

Installing

Clone the repository to your local machine:

git clone https://github.com/sapirmardan/ParallelSubSearch.git

Install the required packages:
```
pip install numpy biopython pytest
```

Executing Program

To run the tests and validate functionality:
```
python3 -m pytest --doctest-modules -v
```
To execute the program and compare the serial vs parallel performance:
```
python3 longest_sub.py
```

Implementation Details

Functions:

longest_contiguous_version1(seq, stop_codon="", gap="", case_sensitive=True): Calculates the longest contiguous sub-sequence for each unique letter in the sequence. It is case-sensitive by default.
longest_contiguous_version2(seq, stop_codon="", gap="", case_sensitive=True): Similar to version 1 but optimized for performance using NumPy for large sequences.
all_longest: This function finds all sub-sequences which are longest by length and not necessarily contiguous.
run_large_test_serial: Executes the data processing in a serial manner.
run_test_parallel: Executes the data processing in parallel using concurrent.futures to improve performance.

Files

longest_sub.py: Contains the main logic for sub-sequence identification.
test_longest_sub.py: Contains unit tests for the functions implemented in longest_sub.py.

Example

Here is a quick example of how to use longest_contiguous:

from longest_sub import longest_contiguous_version1
sequence = Seq("AACCGGTTAACCGGTT")
result = longest_contiguous_version1(sequence)
print(result)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
longest_sub.py		longest_sub.py
test_longest_sub.py		test_longest_sub.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParallelSubSearch

Description

Getting Started

Dependencies

Installing

Executing Program

Implementation Details

Functions:

Files

Example

About

Releases

Packages

Languages

sapir-mardan/ParallelSubSearch

Folders and files

Latest commit

History

Repository files navigation

ParallelSubSearch

Description

Getting Started

Dependencies

Installing

Executing Program

Implementation Details

Functions:

Files

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages