-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathcustomization.Rmd
282 lines (227 loc) · 14.5 KB
/
customization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: "Customization"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
fig_caption: true
self_contained: yes
fontsize: 11pt
documentclass: article
bibliography: references.bib
csl: reference-style.csl
vignette: >
%\VignetteIndexEntry{Customization}
%\VignetteEngine{knitr::rmarkdown_notangle}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
# detect if vignette being built under package check
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
"_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
# set default chunk settings
knitr::opts_chunk$set(fig.align = "center", eval = !is_check)
```
```{r, include = FALSE, eval = TRUE}
# set dummy values so pre-compiled vignette passes checks
spp_name <- "`\"Simulus spp. 670\"`"
spp_id_no <- 632
```
## Introduction
The _aoh R_ package provides a flexible framework for generating Area of Habitat data. By default, it will use elevation data derived from Robinson _et al._ [-@r4] and habitat classification data derived from Lumbierres _et al._ [-@r9]. Its defaults also include using species' elevational limit and habitat preference data from the [International Union for Conservation of Nature (IUCN) Red List of Threatened Species](https://www.iucnredlist.org/). In addition to these defaults, it provides built-in functions to use habitat classification derived from other data sources [e.g., from @r5]. The package can also be used to generate Area of Habitat data using other datasets that have been manually created by the user. For example, it could be used to produce Area of Habitat data using habitat classification data derived from [Copernicus Corine Land Cover](https://land.copernicus.eu/en/products/corine-land-cover) data or species' elevational limit data from the [BirdLife Data Zone](https://datazone.birdlife.org/home).
## Tutorial
Here we will show how Area of Habitat data can be generated using particular datasets (rather than the default datasets). In this tutorial, we will manually import data and use them to generate Area of Habitat data. If you have not previously read through the [Getting started vignette](aoh.html), this is strongly recommended to provide an introduction to the package. To start off, we will load the package. We will also load the [_rappdirs_](https://CRAN.R-project.org/package=rappdirs) package to cache data, the [tibble](https://CRAN.R-project.org/package=tibble) package for storing tabular data, and the [_terra_](https://CRAN.R-project.org/package=terra) and [_ggplot2_](https://CRAN.R-project.org/package=ggplot2) _R_ packages to visualize results.
```{r, message = FALSE, warning = FALSE}
# load packages
library(aoh)
library(terra)
library(tibble)
library(rappdirs)
library(ggplot2)
```
Now we will import geographic range data for the species. Although users would typically obtain such data from the [IUCN Red List](https://www.iucnredlist.org/), here we will import example data distributed with the package. **Please note that these data were not obtained from the IUCN Red List, and were generated using random simulations.**
```{r, fig.width = 7, fig.height = 6, out.fig.width = "90%"}
# find file path for data
spp_range_path <- system.file(
"testdata", "SIMULATED_SPECIES.zip", package = "aoh"
)
# import data
spp_range_data <- read_spp_range_data(spp_range_path)
# preview data
## dataset follows the same format as the IUCN Red List spatial data
print(spp_range_data)
# visualize data
## each panel corresponds to a different seasonal distribution of a species
map <-
ggplot() +
geom_sf(data = spp_range_data, fill = "darkblue") +
facet_wrap(~ id_no + seasonal)
print(map)
```
Next, we will import data to describe the species' habitat preferences. Although such data would be automatically obtained from the [IUCN Red List](https://www.iucnredlist.org/) by default (using the `get_spp_habitat_data()` function), here we will import example data distributed with the package. **As before, please note that these data were not obtained from the IUCN Red List, and were randomly generated.** If you wish to use your own data, please ensure that they follow exactly the same conventions (e.g., column names, data types, and `character` values for the `"suitability"` and `"season"` columns).
```{r}
# find file path for species habitat preference data
spp_habitat_path <- system.file(
"testdata", "sim_spp_habitat_data.csv", package = "aoh"
)
# import species habitat preference data
spp_habitat_data <- read.csv(spp_habitat_path, sep = ",", header = TRUE)
spp_habitat_data <- as_tibble(spp_habitat_data)
# preview data
print(spp_habitat_data, n = Inf)
```
Next, we will import data to describe the species' elevational limits. Although such data would be automatically obtained from the [IUCN Red List](https://www.iucnredlist.org/) by default (using the `get_spp_summary_data()` function), here we will import example data distributed with the package. **As before, please note that these data were not obtained from the IUCN Red List, and were randomly generated.** Since the dataset contains additional columns that aren't strictly necessary, we will also update it to include only necessary columns. If you wish to use your own data, please ensure that they follow the same conventions (e.g., column names, data types).
```{r}
# find file path for species habitat preference data
spp_summary_path <- system.file(
"testdata", "sim_spp_summary_data.csv", package = "aoh"
)
# import species habitat preference data
spp_summary_data <- read.csv(spp_summary_path, sep = ",", header = TRUE)
spp_summary_data <- as_tibble(spp_summary_data)
# extract only necessary columns
col_names <- c("id_no", "elevation_lower", "elevation_upper", "category")
spp_summary_data <- spp_summary_data[, col_names, drop = FALSE]
# preview data
print(spp_summary_data, n = Inf)
```
```{r, include = FALSE}
spp_name <- paste0("`\"", spp_range_data$binomial[[1]], "\"`")
spp_id_no <- paste0("`", spp_range_data$id_no[[1]], "`")
```
After importing all the datasets with the species data, it is important to ensure that each and every species is associated with geographic range, habitat preference, and summary data. Here, the `"id_no"` column values are used to denote different taxa---meaning that each species should have a unique identifier. These identifiers are used when cross-referencing the datasets. For example, the species named `r spp_name` has an identifier (`"id_no"` value) of `r spp_id_no`, and this identifier is used to denote its range in the `spp_range_data` dataset, its habitat preferences in the `spp_habitat_data` dataset, and its elevational limits in the `spp_summary_data` dataset. We can verify that each of the species have the required information across all three datasets using the following code.
```{r}
# verify all identifiers in range data are present in habitat preference data
## if we see TRUE: then this means both datasets have the same taxa identifiers
## if we see FALSE: then some taxa identifiers are missing from one dataset
setequal(spp_range_data$id_no, spp_habitat_data$id_no)
# verify all identifiers in range data are present in summary data
## if we see TRUE: then this means both datasets have the same taxa identifiers
## if we see FALSE: then some taxa identifiers are missing from one dataset
setequal(spp_range_data$id_no, spp_summary_data$id_no)
```
We will now import elevation data. Although such data would be automatically imported by default (using the `get_global_elevation_data()` function), here we will import example data distributed with the package.
```{r, fig.height = 4, fig.width = 5.5, out.width = "90%"}
# find file path for elevation data
elevation_path <- system.file(
"testdata", "sim_elevation_data.tif", package = "aoh"
)
# import elevation data
elevation_data <- rast(elevation_path)
# preview data
print(elevation_data)
# visualize data
plot(elevation_data, main = "Elevation data")
```
Next, we will import habitat classification data. Although such data would be automatically imported by default (using the `get_lumb_cgls_habitat_data()` function), here we will import example data distributed with the package.
```{r, fig.height = 4, fig.width = 5.5, out.width = "90%"}
# find file path for elevation data
habitat_path <- system.file(
"testdata", "sim_habitat_data.tif", package = "aoh"
)
# import habitat classification data
habitat_data <- rast(habitat_path)
# preview data
print(habitat_data)
# visualize data
plot(habitat_data, main = "Habitat classification data")
```
Critically, the elevation data and habitat classification data must have exactly the same spatial properties. This means they must have the same coordinate reference system, resolution, and spatial extent. If you are using elevation or habitat classification data that you have previously prepared yourself (or manually download from online sources), you may need to resample (or reproject) your data to ensure both datasets have the same spatial properties using a geographic information system (GIS). For example, data could be resampled using ESRI ArcGIS, QGIS, [terra R package](https://cran.r-project.org/package=terra), or the [gdalUtilities R package](https://cran.r-project.org/package=gdalUtilities). We can verify that both of the elevation and habitat classification datasets have the same spatial properties using the following code.
```{r}
# verify that elevation and habitat classification data have same properties
## if we see TRUE, this means they have the same spatial properties.
## otherwise, if we see an error, then this means that they do not have
## have the same spatial properties and require updating
compareGeom(elevation_data, habitat_data)
```
We will now import a crosswalk table for the habitat classification data. A crosswalk table specifies which pixel values in the habitat classification data correspond to which [IUCN habitat classes](https://www.iucnredlist.org/resources/habitat-classification-scheme). This table can specify one-to-one relationships (e.g., pixel value 12 corresponds to IUCN class 1.1), one-to-many relationships (e.g., pixel values 12 corresponds to IUCN classes 1.1 and 1.2), and many-to-many relationships (e.g., both pixel values 12 and 13 each correspond to IUCN classes 1.1 and 1.2). Although such data would be automatically imported by default (i.e., the `crosswalk_lumb_cgls_data` built-in dataset), here we will import example data distributed with the package. When using your own habitat classification data, please ensure that it follows the same format (i.e., same column names and data types).
```{r}
# find file path for crosswalk data
crosswalk_path <- system.file(
"testdata", "sim_crosswalk.csv", package = "aoh"
)
# import crosswalk data
crosswalk_data <- read.csv(crosswalk_path, sep = ",", header = TRUE)
crosswalk_data <- as_tibble(crosswalk_data)
# print table
## code column contains codes for the IUCN habitat classes
## value column contains values in the habitat classification data
print(crosswalk_data, n = Inf)
```
After importing all the data, we can clean and collate information together for generating Area of Habitat data.
```{r, message = FALSE, results = "hide"}
# create data with information for Area of Habitat data
spp_info_data <- create_spp_info_data(
spp_range_data,
spp_summary_data = spp_summary_data,
spp_habitat_data = spp_habitat_data
)
```
```{r}
# preview data
print(spp_info_data, width = Inf)
```
Next, we can generate the Area of Habitat data.
```{r, message = FALSE, results = "hide"}
# specify folder to save Area of Habitat data
## although we use a temporary directory here to avoid polluting your computer
## with examples files, you would normally specify the folder
## on your computer where you want to save data
output_dir <- tempdir()
# generate Area of Habitat data
spp_aoh_data <- create_spp_aoh_data(
spp_info_data,
elevation_data = elevation_data,
habitat_data = habitat_data,
crosswalk_data = crosswalk_data,
output_dir = output_dir
)
```
```{r}
# preview results
## resulting dataset is a simple features (sf) object containing
## spatial geometries for cleaned versions of the range data
## (in the geometry column) and the following additional columns:
##
## - id_no : IUCN Red List taxon identifier
## - seasonal : integer identifier for seasonal distributions
## - category : character IUCN Red List threat category
## - full_habitat_code: All IUCN Red List codes for suitable habitat classes
## (multiple codes are delimited using "|" symbols)
## - habitat_code : IUCN Red List codes for suitable habitat classes
## used to create AOH maps
## - elevation_lower : lower limit for the species on IUCN Red List
## - elevation_upper : upper limit for the species on IUCN Red List
## - xmin : minimum x-coordinate for Area of Habitat data
## - xmax : maximum x-coordinate for Area of Habitat data
## - ymin : minimum y-coordinate for Area of Habitat data
## - ymax : maximum y-coordinate for Area of Habitat data
## - path : file path for Area of Habitat data (GeoTIFF format)
print(spp_aoh_data, width = Inf)
```
Finally, let's create some maps to compare the range data with the Area of habitat data. Although we could create these maps manually (e.g., using the [_ggplot2_](https://CRAN.R-project.org/package=ggplot2) _R_ package), we will use a plotting function distributed with the _aoh R_ package for convenience. Although the full dataset contains many species; for brevity, we will only show the first four species' seasonal distributions.
```{r, message = FALSE, warning = FALSE, dpi = 200, fig.width = 5.5, fig.height = 4, out.width = "90%"}
# create maps
## N.B. you might need to install the ggmap package to create the maps
map <-
plot_spp_aoh_data(
spp_aoh_data[1:4, ],
zoom = 6,
maptype = "stamen_toner_background",
maxcell = Inf
) +
scale_fill_viridis_d() +
scale_color_manual(values = c("range" = "red")) +
scale_size_manual(values = c("range" = 0.5)) +
theme(
axis.title = element_blank(),
axis.text = element_text(size = 6),
strip.text = element_text(color = "white"),
strip.background = element_rect(fill = "black", color = "black")
)
# display maps
print(map)
```
## Conclusion
Hopefully, this vignette has provided a useful overview for customizing Area of Habitat data. If you have any questions or suggestions for additional elevation, habitat classification, or crosswalk datasets that could be included in the package, please [file an issue at the package's online code repository](https://github.com/prioritizr/aoh/issues).
## References