-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path05_chow_hfd_minimal.Rmd
134 lines (103 loc) · 3.03 KB
/
05_chow_hfd_minimal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
output:
html_document:
keep_md: true
---
## Download data
```
wget https://github.com/lcdb/genomics-workshop-2018/raw/master/extras/bedtools-and-deseq.tsv
```
## Load and inspect
```{r}
# if downloaded separately:
# df <- read.table('bedtools-and-deseq.tsv')
# For the purposes of reproductibility:
df <- read.table('https://github.com/lcdb/genomics-workshop-2018/raw/master/extras/bedtools-and-deseq.tsv')
head(df)
nrow(df)
ncol(df)
dim(df)
```
## Inspection
How many up/down/unchanged genes?
```{r}
head(df$direction == 'up')
sum(df$direction == 'up')
sum(df$direction == 'dn')
```
Percent up/down:
```{r}
sum(df$direction == 'up') / nrow(df)
sum(df$direction == 'dn') / nrow(df)
```
How to get number of genes with gained H3K27ac?
## Plotting
```{r}
library(ggplot2)
```
```{r}
ggplot(df) + geom_point(mapping=aes(x=baseMean, y=log2FoldChange))
```
We should put `baseMean` on a log scale:
```{r}
ggplot(df) + geom_point(mapping=aes(x=log10(baseMean), y=log2FoldChange))
```
Let's color by direction:
```{r}
ggplot(df) + geom_point(mapping=aes(x=log10(baseMean), y=log2FoldChange, color=direction))
```
Heres a nicer way of coloring:
```{r}
ggplot(df) + geom_point(mapping=aes(x=log10(baseMean), y=log2FoldChange, color=direction)) +
scale_color_manual(
values=c('un'='#888888', 'up'='firebrick', 'dn'='dodgerblue4')
)
```
Now let's color by H3k27ac instead of direction.
```{r}
ggplot(df) +
geom_point(mapping=aes(x=log10(baseMean), y=log2FoldChange, color=gained_ac))
```
Hmm, it looks like the TRUE ones are being hidden. Here is how we can plot
*layers* in ggplot, to make sure the TRUE gets plotted *after* the FALSE. The
key is in the `data=subset(...)` part for the geom:
```{r}
ggplot(df, mapping=aes(x=log10(baseMean), y=log2FoldChange)) +
geom_point(data=subset(df, !df$gained_ac)) +
geom_point(data=subset(df, df$gained_ac), color='red')
```
It might be nice to combine the previous plots together:
```{r}
ggplot(df, mapping=aes(x=log10(baseMean), y=log2FoldChange)) +
geom_point(mapping=aes(color=direction)) +
scale_color_manual(
values=c('un'='#888888', 'up'='firebrick', 'dn'='dodgerblue4')) +
geom_point(
data=subset(df, df$gained_ac),
color='orange', size=.5)
```
Or facet it:
```{r}
ggplot(df, mapping=aes(x=log10(baseMean), y=log2FoldChange)) +
geom_point(mapping=aes(color=direction)) +
scale_color_manual(
values=c('un'='#888888', 'up'='firebrick', 'dn'='dodgerblue4')) +
facet_wrap(~gained_ac)
```
Or a volcano plot:
```{r}
ggplot(df, mapping=aes(x=log2FoldChange, y=-log10(padj))) +
geom_point(mapping=aes(color=direction)) +
scale_color_manual(
values=c('un'='#888888', 'up'='firebrick', 'dn'='dodgerblue4')) +
geom_point(
data=subset(df, df$gained_ac),
color='orange', size=.5)
```
```{r}
ggplot(df, mapping=aes(x=log2FoldChange, y=-log10(padj))) +
geom_point(mapping=aes(color=direction)) +
scale_color_manual(
values=c('un'='#888888', 'up'='firebrick', 'dn'='dodgerblue4')) +
facet_wrap(~gained_ac)
```