Skip to content

meeting_notes

Erica edited this page Oct 22, 2017 · 6 revisions

Meeting Notes

Meeting - 2017-10-20 - 9:30am

General

  • We had an introductory meeting to discuss the scope of the project, our team workflow, and to generate some initial issues.
  • We agreed on the general workflow as described in workflow notes
  • The HackSeq organizers have asked us to try to write up a small manuscript de
  • We reviewed some of the initial data sets, and some of the possible stories to start working on. These were broken down like so:

1. Import

  • We agreed that converting output from disparate tools to a common format would be useful, and would make for easily separable issues
  • For each input tool, we want to write a simple parser that converts the output to BEDPE format
  • I have created issues for each of the initial input tools

2. Annotation

  • We agreed that it would be good/useful to be able to annotate predicted fusion events (from the BEDPE files generated above), against existing fusion databases
  • To that end, I have created a set of issues along the lines of 'Document Data Source', to check whether these data sources have available APIs
  • I have also stubbed out an issue to research additional data sources, as necessary

3. Aggregation

  • As a first pass at this, we thought that having simple scripts (either in something like Jupyter notebooks or RMarkdown documents) to provide general overview summaries of reported events would be useful.
  • As a potential stretch goal, these could be implemented in the MultiQC framework
  • Note that since the BEDPE fields will differ between different tools, there may need to be a 'common core', with extensions for different tools
  • For aggregation of fusions between tools, we thought we'd start by looking at tools like bedtools intersect.

4. Filter/Review

  • Deferred at present, depending on how things go elsewhere

5. Visualize

  • Deferred at present, depending on how things go elsewhere

Meeting Notes - 2017-10-20 - 4:00pm

Day 1 Progress

  • We met to discuss progress on Day 1, and potential actions to kick off Day 2
  • The main things we accomplished on Day 1 were:
    • Writing parsers to convert the input from a whole bunch of fusion callers to a common BEDPE file format.
    • We've got parsers now for seven different tools already merged, with a few more incoming
    • Reviewing available databases of online fusion databases, and determining which ones are suitable for downstream use

Next Steps

  • We also discussed the next steps of the project. Things that we're confident about at this point:
    • We should focus on a few of the annotation sources that have high-quality, comparable data sources, and convert them to BEDPE format as well, so we can more easily intersect them with the fusion calls
    • We should start generating some scripts for generating simple summary statistics for the existing fusion calls
    • We should generate a revised overview diagram of what the overall workflow looks like, so we have a better idea of what 'done' looks like
  • Things we (or at least, myself, @rdocking) need to think about a bit more are:
    • What does the merge/intersect tool look like? Should this be in Python, in R, both?
    • Are we going to worry about both comparing calls from different callers, or calls from different replicates? (Most likely just calls from different callers)
  • I will submit some small issues for the obvious actionable things, and we'll reconvene tomorrow to flesh out the rest

Meetings Notes - 2017-10-21 - 9:00am

  • We met briefly to start Day 2, and discussed what next steps we'd like to take
  • My brief notes: we want to wrap up the process of writing importers/converters fairly soon, so we can move on to the next aspects of the project. Some of that will be QC (like the validator @hirak is working on), and exploratory analysis (like what @stef is working on). Next main tasks will be starting to merge/annotate results from different data sources. We thought we’d start by exploring using bedtools commands to start intersecting the various annotation data sources.
  • After lunch, we should start closing down the remaining 'convert formats' issues and move on to the next stage of the pipeline

Meetings Notes - 2017-10-22 - 9:00am

  • We met briefly to start Day 3, and discussed merging strategies for the bedpe files, as well as to generate a matrix of 0's and 1's for whether a gene fusion was detected by each tool, and whether a gene fusion was annotated in each of the databases - these matrices will be used to generate upset plots for data visualization
  • Later in the afternoon, @rdocking created slides for the presentation at the end of the day, and there was some focus on wrapping up documentation