-
Notifications
You must be signed in to change notification settings - Fork 1
meeting_notes
Erica edited this page Oct 22, 2017
·
6 revisions
- We had an introductory meeting to discuss the scope of the project, our team workflow, and to generate some initial issues.
- We agreed on the general workflow as described in workflow notes
- The HackSeq organizers have asked us to try to write up a small manuscript de
- We reviewed some of the initial data sets, and some of the possible stories to start working on. These were broken down like so:
- We agreed that converting output from disparate tools to a common format would be useful, and would make for easily separable issues
- For each input tool, we want to write a simple parser that converts the output to BEDPE format
- I have created issues for each of the initial input tools
- We agreed that it would be good/useful to be able to annotate predicted fusion events (from the BEDPE files generated above), against existing fusion databases
- To that end, I have created a set of issues along the lines of 'Document Data Source', to check whether these data sources have available APIs
- I have also stubbed out an issue to research additional data sources, as necessary
- As a first pass at this, we thought that having simple scripts (either in something like Jupyter notebooks or RMarkdown documents) to provide general overview summaries of reported events would be useful.
- As a potential stretch goal, these could be implemented in the MultiQC framework
- Note that since the BEDPE fields will differ between different tools, there may need to be a 'common core', with extensions for different tools
- For aggregation of fusions between tools, we thought we'd start by looking at tools like
bedtools intersect
.
- Deferred at present, depending on how things go elsewhere
- Deferred at present, depending on how things go elsewhere
- We met to discuss progress on Day 1, and potential actions to kick off Day 2
- The main things we accomplished on Day 1 were:
- Writing parsers to convert the input from a whole bunch of fusion callers to a common BEDPE file format.
- We've got parsers now for seven different tools already merged, with a few more incoming
- Reviewing available databases of online fusion databases, and determining which ones are suitable for downstream use
- We also discussed the next steps of the project. Things that we're confident about at this point:
- We should focus on a few of the annotation sources that have high-quality, comparable data sources, and convert them to BEDPE format as well, so we can more easily intersect them with the fusion calls
- We should start generating some scripts for generating simple summary statistics for the existing fusion calls
- We should generate a revised overview diagram of what the overall workflow looks like, so we have a better idea of what 'done' looks like
- Things we (or at least, myself, @rdocking) need to think about a bit more are:
- What does the merge/intersect tool look like? Should this be in Python, in R, both?
- Are we going to worry about both comparing calls from different callers, or calls from different replicates? (Most likely just calls from different callers)
- I will submit some small issues for the obvious actionable things, and we'll reconvene tomorrow to flesh out the rest
- We met briefly to start Day 2, and discussed what next steps we'd like to take
- My brief notes: we want to wrap up the process of writing importers/converters fairly soon, so we can move on to the next aspects of the project. Some of that will be QC (like the validator @hirak is working on), and exploratory analysis (like what @stef is working on). Next main tasks will be starting to merge/annotate results from different data sources. We thought we’d start by exploring using bedtools commands to start intersecting the various annotation data sources.
- After lunch, we should start closing down the remaining 'convert formats' issues and move on to the next stage of the pipeline
- We met briefly to start Day 3, and discussed merging strategies for the bedpe files, as well as to generate a matrix of 0's and 1's for whether a gene fusion was detected by each tool, and whether a gene fusion was annotated in each of the databases - these matrices will be used to generate upset plots for data visualization
- Later in the afternoon, @rdocking created slides for the presentation at the end of the day, and there was some focus on wrapping up documentation