Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peaks2genes: modifications #1030

Merged
merged 13 commits into from
Sep 30, 2018
Merged

Conversation

mblue9
Copy link
Contributor

@mblue9 mblue9 commented Sep 28, 2018

In response to this feedback comment on the peaks to genes tutorial:

What could be improved?: More information on the flanking region tool (how it works)

I've tried to clarify in this PR what the Get Flanks tool is doing in the text and I've added in a small diagram (made with Powerpoint 😳if there's a better way to make one please let me know).

I've made a few changes here:

  • Tried to clarify Get Flanks step
  • I get chromosome 11 having the highest no. of genes (not chr 7) using the provided files (?) so have changed that in the text
  • Added steps to add tags to the inputs (peaks and genes) as that makes it easier to see (for me anyway) which data the Replace/Cut etc outputs are
  • Removed the Convert Genomic Intervals to Bed step as it doesn't seem necessary, Intersect gives the same output without that step. Converting the the peaks file to bed also looks weird to me, as it's not bed format, it's just a table e.g. there's no Strand column and when you convert to bed a numeric column ends up being called "Strand" ) so Interval looks better I think
  • Added some screenshots to clarify some expected outputs

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mblue9!

can skip the next step. Otherwise, it might be reasonable to include the promoter region into the comparison, e.g. because
you want to include Transcriptions factors in ChIP-seq experiments.
Our goal is to compare the 2 region files (the genes file and the peak file from the publication)
to know which peaks are related to which genes. If you only want to know which peaks are located **inside** genes you
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we but the word gene body somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've added that in now, see what you think

Our goal is to compare the 2 region files (the genes file and the peak file from the publication)
to know which peaks are related to which genes. If you only want to know which peaks are located **inside** genes you
can skip the next step. Otherwise, it might be reasonable to include the **promoter** region of the genes into the comparison, e.g. because
you want to include transcriptions factors in ChIP-seq experiments. There is no strict definition for promoter region but 2kb upstream of the Transcription Start Site (start of region) is commonly used. We'll use the **Get Flanks** tool to get regions 2kb bases upstream of the start of the gene to 10kb bases downstream of the start (12kb in length). To do this we tell the Get Flanks tool we want regions upstream of the start, with an offset of 10kb, that are 12kb in length, as shown in the diagram below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSS as a shortcut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have changed it to TSS

bgruening
bgruening previously approved these changes Sep 28, 2018
@mblue9
Copy link
Contributor Author

mblue9 commented Sep 28, 2018

Just wondering, what do you think about having "Introduction" in the name at the top of the tutorial e.g. "Galaxy Introduction: From peaks to genes" instead of just "From peaks to genes", to emphasise that this is material to help introduce people to Galaxy?

@bgruening
Copy link
Member

But the entire section/topic is called Introduction to Galaxy Analyses. Is that not redundant?

@mblue9
Copy link
Contributor Author

mblue9 commented Sep 28, 2018

Maybe. Just if you look at below, although the objectives show it's an intro, to me the name makes it look like its focus is on annotating peak regions (and it's not the most efficient way to do it currently in Galaxy e.g. in this case you could just use Chipseeker and not even bother getting data from UCSC). Having it in the name could help emphasise that this is an example to introduce people to Galaxy but I don't have a strong opinion on this.

screen shot 2018-09-28 at 8 29 28 pm

@hexylena
Copy link
Member

Maybe the topic name could be written above /before the tutorial name?

@mblue9
Copy link
Contributor Author

mblue9 commented Sep 29, 2018

Maybe the topic name could be written above /before the tutorial name?

That sounds like a good idea to me. But happy to have that in the future and not wait for it for this PR.

@mblue9 mblue9 changed the title Peaks2genes:Try to clarify Get Flanks step Peaks2genes: modifications Sep 29, 2018
@@ -349,35 +358,27 @@ you want to include Transcriptions factors in ChIP-seq experiments.
> 3. Rename your dataset to reflect your findings
{: .hands_on}

You might have noticed that the UCSC file is in `BED` format and has a database associated to it. That's what we want for our peak file as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this to demonstrate the conversion feature - in respect to the "change filetype". As a bonus, the presenter can also demonstrate the implicit-conversion if wished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've added it back in. But imho if it's not necessary for users to do that conversion here then it's just adding confusion and I'd demonstrate the conversion somewhere else where it is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree here. The Get Flanks tool is also not needed as a similar result can be archived by UCSC directly. It's important that people know the interface and this is the purpose of this tutorial, it's an introduction to Galaxy.
Also please note that they are different intersect tools in Galaxy and not all can cope with interval files and hence convert them implicitly. It's therefore good to know that there are implicit and explicit conversions, maybe this should be made more clear in the text - I stress this a lot during my session.

Thanks for adding it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Get Flanks tool is also not needed as a similar result can be archived by UCSC directly.

Well while we're at it UCSC is not needed here either 😄 But yes, the text could be made clearer I think, what do you think about what I've added here now

You might have noticed that the UCSC file is in `BED` format and has a database associated to it. That's what we want for our peak file as well. The **Intersect** tool we will use can automatically convert interval files to BED format but we'll convert our interval file explicitly here to show how this can be achieved with Galaxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the text. Thanks.

@mblue9
Copy link
Contributor Author

mblue9 commented Sep 29, 2018

If anyone could please merge this it would be great to have for the workshop tomorrow or if it needs more changes please let me know.

@bgruening bgruening merged commit 20d2174 into galaxyproject:master Sep 30, 2018
@mblue9
Copy link
Contributor Author

mblue9 commented Sep 30, 2018

Thanks a lot @bgruening !!

@mblue9 mblue9 deleted the peaks2genes_edits branch October 7, 2018 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants