-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path03-visualize-solution.Rmd
297 lines (176 loc) · 4.84 KB
/
03-visualize-solution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
title: "Visualization"
output: github_document
---
<https://github.com/maurolepore/open-science-with-r>
## Setup
The setup chunk always runs before anything else.
* Use tidyverse package.
```{r setup}
# install.packages("tidyverse")
library(tidyverse)
```
Bored?
* Why did I comment the the call to "install.packages()"?
* Hint: Is it polite to force-install software in someone else's computer?
## Warm up
* Explore https://ggplot2.tidyverse.org/
* Plot what you see in Usage.
```{r}
library(ggplot2)
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point()
```
## Data
* Read the .csv file at <https://bit.ly/03-data-ca>
```{r}
# National Parks in California
ca <- readr::read_csv("https://bit.ly/03-data-ca")
```
Bored?
* How would you write `ca` to the file "data/03-data-ca.csv"?
* How would you re-read it from "data/03-data-ca.csv"?
Copy-paste from the code to read all other files.
```{r}
# Acadia National Park
acadia <- read_csv("https://bit.ly/03-data-acadia")
# Southeast US National Parks
se <- read_csv("https://bit.ly/03-data-se")
# 2016 Visitation for all Pacific West National Parks
visit_16 <- read_csv("https://bit.ly/03-data-visit-16")
# All Nationally designated sites in Massachusetts
mass <- read_csv("https://bit.ly/03-data-mass")
```
Preview the data with `head()`.
```{r}
head(ca)
```
Bored?
* Try also: running just `ca`, `glimpse()`, `View()`.
* Compare `as_tibble(ca)` with `as.data.frame(ca)`.
## Build
* What relationship do you expect to see between year and the number of visitors?
* Plot the data `ca`. Put `year` on x and `visitors` on y.
```{r}
# Example from https://ggplot2.tidyverse.org/
# ggplot(data = mpg, aes(x = displ, y = hwy, colour = class)) +
# geom_point()
ggplot(data = ca, aes(x = year, y = visitors)) +
geom_point()
```
* Map the values of the variable `park_name` to the `color` aesthetics.
```{r}
ggplot(ca, aes(year, visitors, color = park_name)) +
geom_point()
```
Bored?
* I avoided `data =`, `x =`, and `y =` but used `colour` explicitly. Why?
* When is it a good idea to put `aes()` inside `ggplot()`?
* When is it a good idea to put `aes()` inside `geom_*()`?
## Customizing
Customize this plot:
* Add a call to `labs()` to create these labels:
* title = "Acadia National Park Visitation".
* y = "Visitation".
* x = "Year".
* Add a call to `theme_bw()` to change the theme to black and white.
```{r}
ggplot(ca, aes(year, visitors, color = park_name)) +
geom_point()
```
Bored?
* There are specific functions for x, y, and title labels. Find them.
* Find and try other themes.
## Faceting
data = se (Southeast US National Parks)
* Scatter-plot `visitors` (y) by `year` (x) for the `se` dataset.
* Add `facet_wrap(~ state)` to create a panel for each state.
```{r}
ggplot(se, aes(year, visitors)) +
geom_point() +
facet_wrap(~ state)
```
Bored?
* What does the `scales` argument do?
* Find another `facet_*()` function.
## Geoms
### Discrete x
data = se
You can assign to a new variable part(s) of a ggplot and reuse them later.
* Why is this a poor plot?
```{r}
p <- ggplot(se, aes(park_name, visitors))
p +
geom_point()
```
* Change the opacity of the points with `alpha = 0.25`.
* Here I also flip the coordinate system with `coord_flip()`
* Is this a bit better?
```{r}
p +
geom_point(alpha = 0.25) +
coord_flip()
```
* Here I use `geom_jitter()` instead of `geom_point()`.
* Reorder the code so it runs correctly.
```{r}
p + coord_flip() + geom_jitter()
```
* Create a boxplot.
```{r}
p + geom_boxplot() + coord_flip()
```
### Time series
data = acadia
* Create a scatter plot
* Add lines
* Add a smoothed mean
```{r}
ggplot(acadia, aes(year, visitors)) +
geom_point() +
geom_line() +
geom_smooth()
```
Bored?
* Move the smoothed mean to become the second layer. What subtle thing happened?
### Bar charts
data = visit_16
* Create a columns bar chart where the height of the columns represent `visitors` (y) per `state` (x).
```{r}
ggplot(visit_16, aes(state, visitors)) +
geom_col()
```
* Map the bar fill to `park_name`.
```{r}
q <- ggplot(visit_16, aes(state, visitors, fill = park_name))
q + geom_col()
```
### Position adjustments
* Now dodge the position of the bars to compare them side by side.
```{r}
my_plot <- q + geom_col(position = "dodge")
my_plot
```
## Arranging and exporting plots
data = mass
* Export `my_plot` with `ggsave()`.
```{r}
ggsave(filename = "03-my_plot.pdf", plot = my_plot)
```
Bored?
* Try to export a pdf with RStudio's Viewer panel.
### ggplotly
* Make `my_plot` interactive with the function `ggplotly()` of the package plotly .
```{r}
if (interactive()) {
library(plotly)
ggplotly(my_plot)
}
```
***
# Take aways
You can use this code template to make thousands of graphs with **ggplot2**.
```{r eval = FALSE}
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
```