layout |
---|
subsite-galaxy |
Welcome to the Galaxy HiCExplorer -- a webserver to process, analyse and visualize Hi-C data.
Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take a guided tour through Galaxy's user interface.
Take a guided tour for an introduction to Galaxy HiCExplorer and Hi-C data analysis. This tour is guides you through the Hi-C tutorial on the Galaxy Training Network where you can analyse Hi-C data of Drosophila melanogaster. Follow the tutorial to understand the analysis steps better or as a help which parameters are useful.
A precomputed history of the tutorial can be viewed here.
A more advanced tutorial is hosted on readthedocs.io. It is designed for the shell based version of the HiCExplorer but can be easily adapted to Galaxy HiCExplorer. In this tutorial mouse stems cells from Marks et al. (2015) are analysed. We provided the input fastq files in our data library.
We recommend to follow the tutorial on FASTQC for quality checks.
The Galaxy Training Network tutorial uses Hi-C data from Drosophila melanogaster and is hosted on zenodo:
Additional we provide the data in the shared data library of the Galaxy HiCExplorer. In comparison to the data hosted on zenodo it contains preprocessed intermediate files.
Galaxy HiCExplorer can process large Hi-C data. We processed Hi-C data with around 750 million reads from Rosa-Garrido et al.. Have a look at the preprocessed files.
(A) Galaxy HiCExplorer workflows and tools. Quality control tools: (B) Output of hicCorrelate comparing two wild types and one knockdown samples. (C) Output of hicPlotDistVsCounts that shows changes of the number of contacts for different conditions. Analysis tools: (D) hicPlotMatrix of the Pearson correlation matrix derived from a contact matrix for chromosome 6 in mouse computed with hicTransform. The optional data track at the bottom shows the first eigenvector for A/B compartment obtained using hicPCA. (E) The pixel difference between a Hi-C corrected matrix for wild type condition and a knock down was computed using hicCompareMatrices and a 7Mb region is visualized using hicPlotMatrix. Visualization tools: (F) Contact matrix plot of a 80 to 105 Mb region of chromosome 2 in log scale. (G) Example output of hicPlotViewpoint showing the corrected number of Hi-C contacts for a single bin in chromosome 5 (output similar to 4C-seq) (Andrey 2017). (H) A Hi-C matrix was converted into an observed vs. expected matrix using hicTransform and this matrix, together with the location of high-affinity sites from (Ramirez 2015) were used to run hicAggregateContacts. (I) 85 Mb to 110 Mb region from human chromosome 2 visualized using hicPlotTADs. TADs were computed by hicFindTADs. The additional tracks added correspond to: TAD- separation score (as reported by hicFindTADs), chromatin state , principal component 1 (A/B compartment) computed using hicPCA, ChIP-seq coverage for the H3K27ac mark, DNA methylation, and a gene track. Hi-C data for B, C, E and H from Drosophila melanogaster S2 cells from (Ramirez 2018). Hi-C data for D, F and I from mouse cardiac myocytes(Nothjunge 2017). Additional tracks in I from (Nothjunge 2017).
To automatize different consecutive steps we provide the following workflows in three categories: From scratch (FASTQ files), from scratch (FASTQ files) and summing up replicates and if you have already your contact matrix. Many workflows require collections of FASTQ files as an input, it is shown here how to create a collection. Please do not forget to check the quality of the FASTQ files with FastQC.
Please have in mind that all workflows need additional input from the user. All mapping steps are done with BWA-MEM and the correct reference genome need to be defined by the user. The correct restriction site and the bin size for hicBuildMatrix needs to be defined too. The correction of the matrix is done with the default parameters of -1.5 and 5, change this if necessary. Furthermore, the correct region and or chromosome needs to be defined for plotting the matrix, TADs or PCA.
These workflows expect collections of FASTQ files as an input. The first collections needs to have all forward strand FASTQ files and the second one all reverse FASTQ files. Please make sure that the order of the FASTQ files in both collections is equal. The order is important to associate the related forward and reverse read strand files.
The following workflows are provided:
- From scratch to a contact matrix
- From scratch to PCA
- From scratch to TAD
- From scratch to PCA and plotting
- From scratch to TAD and plotting
These workflows takes collections of FASTQ files for forward and reverse strand as an input, for each pair a contact matrix is build and all created contact matrices are summed up to one contact matrix. Use this workflow if you want to use replicates to increase statistical power of your contact matrix and the replicates are checked to be correct.
- From scratch to a contact matrix (summing up replicates)
- From scratch to PCA (summing up replicates)
- From scratch to TAD (summing up replicates)
- From scratch to PCA and plot (summing up replicates)
- From scratch to TAD and plot (summing up replicates)
- From scratch to TADs, PCA and plot (summing up replicates)
Use the following workflows if you have already created a contact matrix.
Preprocssed SAM/BAM files: To build the contact matrix the SAM/BAM files need to generated using the --reorder option from bowtie2 / hisat2 to output the SAM/BAM files in the exact same order as in the fastq files. To cover the identical reason, the SAM/BAM file should not be sorted. Please make sure your preprocessed SAM/BAM files fulfill these requirements, if not the creation of a contact matrix with hicBuildMatrix will fail.
We recommend to use BWA-MEM with the Hi-C specific parameters, as shown in our tutorials.
Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. "Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization", Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: 10.1093/nar/gky504