GitHub - Danigro12/align_validation: Validation of different seeds (or k-mers) for window-alignment algorithms.

Alignment Validation of SORTMERNA

Script developed for testing SortMeRNA alignment algorithm. This testing is specially useful for short sequences datasets, like miRNA. In the future, it could be adjusted for testing others alignment algorithms.

What you need to run it?

SortMeRNA installed (https://github.com/sortmerna)
Python v.3+
Biopython (https://biopython.org)

Run the .sh file on the terminal with the following inputs:

batch_align.sh < path_to_sortme_rna_bin -- min_len_of_read -- max_len_of_read -- cp_database -- cn_database

path_to_sortme_rna_bin = path to sortmerna binary file
min_len_of_read = the minimum length of the sequence to be tested
max_len_of_read = the minimum length of the sequence to be tested
cp_database = database used as positive control
cn_database = database used as negative control

The script does the following steps:

From some database specified by user, take random sequences and create a shorter reference (it is useful for huge databases. If you'd like to use the complete database, just input a a number higher than the total database sequences). The script will create 2 databases, one that we previously know that should align (cp, positive control) and another that we don't want to align (cn, negative control)
Create a .fasta that follows the following formula:

100 positive control samples, coming from the positive db created. Length of the sequences: User specified minimum length.
100 positive control samples, originating from the positive db created. Length of the sequences: User specified maximum length.
100 complementary reverse positive control samples, originating from the positive db created. Length of the sequences: User specified minimum length.
100 complementary reverse positive control samples, originating from the positive db created. Length of the sequences: User specified maximum length.
100 negative control samples, coming from the negative db created. Length of the sequences: User specified minimum length.
100 negative control samples, coming from the negative db created. Length of the sequences: User specified maximum length.
100 complementary reverse negative control samples, originating from the negative db created. Length of the sequences: User specified minimum length.
100 complementary reverse negative control samples, originating from the negative db created. Length of the sequences: User specified maximum length.
100 random samples (negative control). Length of the sequences: User specified minimum length.
100 random samples (negative control). Length of the sequences: User specified maximum length.

Use alignment program (sortmerna) with seeds 8, 10, 12, 14, 16 and 18.
Generate a report with percentages of alignments.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
batch_align.sh		batch_align.sh
build_file.py		build_file.py
build_results.py		build_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Alignment Validation of SORTMERNA

About

Uh oh!

Releases

Packages

Languages

License

Danigro12/align_validation

Folders and files

Latest commit

History

Repository files navigation

Alignment Validation of SORTMERNA

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages