meeting_notes

Meeting Notes

We had an introductory meeting to discuss the scope of the project, our team workflow, and to generate some initial issues.
We agreed on the general workflow as described in workflow notes
The HackSeq organizers have asked us to try to write up a small manuscript de
We reviewed some of the initial data sets, and some of the possible stories to start working on. These were broken down like so:

We agreed that converting output from disparate tools to a common format would be useful, and would make for easily separable issues
For each input tool, we want to write a simple parser that converts the output to BEDPE format
I have created issues for each of the initial input tools

We agreed that it would be good/useful to be able to annotate predicted fusion events (from the BEDPE files generated above), against existing fusion databases
To that end, I have created a set of issues along the lines of 'Document Data Source', to check whether these data sources have available APIs
I have also stubbed out an issue to research additional data sources, as necessary

As a first pass at this, we thought that having simple scripts (either in something like Jupyter notebooks or RMarkdown documents) to provide general overview summaries of reported events would be useful.
As a potential stretch goal, these could be implemented in the MultiQC framework
Note that since the BEDPE fields will differ between different tools, there may need to be a 'common core', with extensions for different tools
For aggregation of fusions between tools, we thought we'd start by looking at tools like bedtools intersect.

We met to discuss progress on Day 1, and potential actions to kick off Day 2
The main things we accomplished on Day 1 were:
- Writing parsers to convert the input from a whole bunch of fusion callers to a common BEDPE file format.
- We've got parsers now for seven different tools already merged, with a few more incoming
- Reviewing available databases of online fusion databases, and determining which ones are suitable for downstream use

We also discussed the next steps of the project. Things that we're confident about at this point:
- We should focus on a few of the annotation sources that have high-quality, comparable data sources, and convert them to BEDPE format as well, so we can more easily intersect them with the fusion calls
- We should start generating some scripts for generating simple summary statistics for the existing fusion calls
- We should generate a revised overview diagram of what the overall workflow looks like, so we have a better idea of what 'done' looks like
Things we (or at least, myself, @rdocking) need to think about a bit more are:
- What does the merge/intersect tool look like? Should this be in Python, in R, both?
- Are we going to worry about both comparing calls from different callers, or calls from different replicates? (Most likely just calls from different callers)
I will submit some small issues for the obvious actionable things, and we'll reconvene tomorrow to flesh out the rest

We met briefly to start Day 2, and discussed what next steps we'd like to take
My brief notes: we want to wrap up the process of writing importers/converters fairly soon, so we can move on to the next aspects of the project. Some of that will be QC (like the validator @hirak is working on), and exploratory analysis (like what @stef is working on). Next main tasks will be starting to merge/annotate results from different data sources. We thought we’d start by exploring using bedtools commands to start intersecting the various annotation data sources.
After lunch, we should start closing down the remaining 'convert formats' issues and move on to the next stage of the pipeline

We met briefly to start Day 3, and discussed merging strategies for the bedpe files, as well as to generate a matrix of 0's and 1's for whether a gene fusion was detected by each tool, and whether a gene fusion was annotated in each of the databases - these matrices will be used to generate upset plots for data visualization
Later in the afternoon, @rdocking created slides for the presentation at the end of the day, and there was some focus on wrapping up documentation