GitHub - sof202/ChromOptimise: Find the optimum number of states to use in a ChromHMM model

ChromOptimise is a pipeline that identifies the optimum number of states that should be used with ChromHMM's LearnModel command for a particular genomic dataset.

For more specific information, please head over to the wiki.

Motivation

When using ChromHMM to learn hidden Markov models for genomic data, it is often difficult to determine how many states to include:

Including too many states will result in overfitting your data and introduces redundant states
Including too few states will result in underfitting your data and thus results in lower model accuracy

This pipeline identifies the optimal number of states to use by finding a model that avoids the two above points.

After using this pipeline, the user will have greater knowledge over their dataset in the context of ChromHMM, which will allow them to make more informed decisions as they continue to further downstream analysis.

Getting started

Clone this repository
Ensure all required software is installed
If using LDSC, download 1000 genomes files (or similar) from this repository
Copy the configuration files to a memorable location (recommended: next to your data) and then fill them in using the templates provided. DO NOT CHANGE THE NAMES OF THESE FILES.
- If you are feeling lazy. You can just edit the files where they already are. The suggetsion to move them is to accomodate having mutliple configs for different projects.
Run the setup executable, providing the path to the directory with the config files in them as the first argument:

./setup path/to/configuration/directory

Usage

After completing 'getting started', run the master script (ChromOptimise.sh) in the command line with:

bash ChromOptimise.sh path/to/your/configuration/directory

Alternatively, you can run each of the shell scripts in JobSubmission sequentially.

sbatch 1_BinarizeFiles.sh path/to/your/configuration/directory

For further information please see the pipeline explanation.

There also exists supplementary scripts for further information on your chosen data set. Most importantly, thresholds used in redundancy analysis can be inferred from the results of Redundancy_Threshold_Optimisation. Further details for these scripts can be found in the wiki.

Software requirements

This pipeline requires a unix-flavoured OS with the following software installed:

Bash (>=4.2.46(2))
SLURM Workload Manager (>=20.02.3)
conda(>=23.10.0)
ChromHMM (>=1.23)
sed (>=4.2.2)
LDSC (>=aa33296)
gzip (>=1.5)
awk (>=4.0.2)

Additionally, conda environments are created for you to obtain:

R v4.4.1
java-jdk v8.0.112
bedtools v2.27.1

Further information

This study makes use of data generated by the Blueprint Consortium. A full list of the investigators who contributed to the generation of the data is available from www.blueprint-epigenome.eu. Funding for the project was provided by the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 282510 – BLUEPRINT.

For any further enquiries, please open an issue or contact Sam Fletcher:
s.o.fletcher@exeter.ac.uk

Name	Name	Last commit message	Last commit date
Latest commit sof202 fix: ensure that environments are not overwritten if they already exist Nov 27, 2024 3438644 · Nov 27, 2024 History 737 Commits
.github	.github	fix: typo	Aug 6, 2024
JobSubmission	JobSubmission	feat: Add renv (#46 )	Nov 5, 2024
Rscripts	Rscripts	fix: add execution permissions	Nov 27, 2024
Setup	Setup	fix: ensure that environments are not overwritten if they already exist	Nov 27, 2024
documentation	documentation	feat: Remove module calls (#44 )	Nov 5, 2024
renv	renv	feat: Add renv (#46 )	Nov 5, 2024
supplementary	supplementary	feat: Add renv (#46 )	Nov 5, 2024
wrappers	wrappers	fix: point towards Config.txt/R instead of old config file names	Aug 7, 2024
.Rprofile	.Rprofile	feat: Add renv (#46 )	Nov 5, 2024
.gitignore	.gitignore	refactor: Ignore Rplots.pdf	Aug 8, 2024
CONTRIBUTING.md	CONTRIBUTING.md	docs: add guidelines on contribution	Aug 7, 2024
ChromOptimise.sh	ChromOptimise.sh	refactor: Remove file names (#42 )	Oct 30, 2024
LICENSE	LICENSE	Update license	Apr 8, 2024
README.md	README.md	feat: Add renv (#46 )	Nov 5, 2024
environment-ldsc.yml	environment-ldsc.yml	fix: use correct conda setup for ldsc	Nov 27, 2024
renv.lock	renv.lock	feat: Add renv (#46 )	Nov 5, 2024
requirements-R-java.txt	requirements-R-java.txt	feat: Remove module calls (#44 )	Nov 5, 2024
requirements-bedtools.txt	requirements-bedtools.txt	feat: Remove module calls (#44 )	Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

Motivation

Getting started

Usage

Software requirements

Further information

About

Releases 5

Packages

Languages

License

sof202/ChromOptimise

Folders and files

Latest commit

History

Repository files navigation

Table of contents

Motivation

Getting started

Usage

Software requirements

Further information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages