-
Notifications
You must be signed in to change notification settings - Fork 9
v2.1.9 #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Refactored pool_post_analisi_snp.py to use subprocess for error handling and improved chromosome extraction. Enhanced post_analisi_snp.sh to clean malformed lines and ensure consistent per-chromosome processing. In crisprme.py, relaxed the output folder non-empty check to allow reuse, commenting out the previous error-raising behavior.
Introduces a CI/CD test mode to crisprme.py and submit_job_automated_new_multiple_vcfs.sh, allowing early exit for automated testing. Adds test/benchmark/benchmark.sh to validate CRISPRme's SNP-aware off-target detection using 1000 Genomes data. Minor formatting update in pool_post_analisi_snp.py.
Restores output folder non-empty check in crisprme.py, updates benchmark.sh to use 'python crisprme.py', and adds check_sites.py to compare CRISPRme and brute-force offtarget sites for consistency in CI/CD tests.
Introduce README for the benchmark pipeline in test/benchmark, detailing the purpose, setup, execution, and interpretation of the CRISPRme CI/CD validation process against a brute-force ground truth. This documentation guides users and developers on running and understanding the automated correctness checks for variant-aware off-target site retrieval.
Improved chromosome-specific target extraction and validation in post_analisi_indel.sh, ensuring malformed lines are removed and headers are handled correctly. Enabled removal of temporary chrom-specific files in post_analisi_snp.sh. Updated submit_job_automated_new_multiple_vcfs.sh to match file patterns more robustly when cleaning up result files after errors.
Improved comments and English translations in analisi_indels_NNN.sh. Refactored post_analisi_indel.sh and post_analisi_snp.sh to streamline chromosome-specific target extraction and header handling, replacing awk with grep for efficiency. Removed the unused convert_gnomAD.py script. Updated submit_job_automated_new_multiple_vcfs.sh to adjust CI/CD test handling and conditional directory cleanup.
Refactored PostProcess/analisi_indels_NNN.sh for clarity and variable naming consistency. Improved parallelization and error handling in pool_post_analisi_indel.py. Added BFTARGETSMD5 to utils.py for validation. Moved and updated check_sites.py to PostProcess/validate.py, adjusting paths and imports. In crisprme.py, removed debug restriction for CI/CD test, added validate-test functionality with help and execution logic, and updated help messages. Minor update to complete_test.py to pass --ci-cd-test flag.
Introduces a check to prevent rerunning the complete-test if output already exists in complete_test.py, and adds a check for required CRISPRitz target files in validate.py. Fixes a conditional in post_analisi_snp.sh, improves error messaging, and adds the 'validate-test' command to crisprme.py with debug output for validation script execution.
Enhanced PostProcess/validate.py by adding dataset consistency checks, support for chrX, and refactoring validation logic into a dedicated function. Updated PostProcess/complete_test.py to fix output directory handling and exit code. Improved error message clarity in crisprme.py for complete test failures.
Refactored PostProcess/validate.py to modularize and clarify the validation workflow, including improved error handling, explicit file checks, and detailed diagnostics for mismatches. The brute-force targets are now downloaded and verified with an MD5 checksum. Updated crisprme.py to actually run the validation script and raise an error if validation fails, instead of just printing the command.
Added stderr progress messages to PostProcess/validate.py for better visibility during brute-force target download and validation. Removed the test/benchmark/README.md and test/benchmark/benchmark.sh files, effectively deleting the CI/CD benchmark pipeline and its documentation.
Expanded and clarified docstrings throughout validate.py for all functions, improving maintainability and user understanding. Updated README to document the new Off-target Sites Validation Test functionality, including usage, requirements, and output details, and reorganized the Table of Contents and section numbering for clarity.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces changes that will be part of v2.1.9 of CRISPRme. Changes interest enhancements to testing workflows, post-analysis pipelines, validation checks, and infrastructure support to improve developer experience and analysis correctness.
Key Improvements
Post-Analysis Refactors & Robustness
Refactored SNP post-analysis (
pool_post_analisi_snp.py) to use safer subprocess error handling and ensure consistent chromosome extraction.Improved post-analysis scripts for INDEL handling and temporary file cleanup to avoid malformed entries and redundant artifacts.
Removed unused Python scripts (e.g.,
convert_gnomAD.py) to streamline the codebase and reduce maintenance overhead.Validation & Workflow Enhancements
Added validation logic for the complete-test workflow to check dataset consistency and verify required files prior to execution.
Integrated an off-target sites check adding functionality
validate-testthat validates CRISPRme results using brute-force ground truth off-target sites.Improved error messaging, progress reporting during validation, and integration of brute-force target MD5 checks for deterministic comparisons.
Workflow Behavior Adjustments
Tweaked output directory checks to conditionally allow reuse where appropriate during automated tests.
Minor commits to add or update helper files (e.g..,
brute_force_1000G.tsv) supporting validation workflows.Post-Merge / Follow-Up
Update version references in packaging workflows (e.g., Conda/bioconda, Docker) to v2.1.9.
Ensure documentation reflects new test and validation features for contributors.