-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
267 lines (200 loc) · 24.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
---
author: 'Alexys Herleym Rodríguez Avellaneda'
date: 'Jan 2020'
institution: 'University of Münster'
division: 'Institute for Geoinformatics - IFGI'
advisor: 'Dr. Edzer Pebesma'
#If you have more two advisors, un-silence line 7
altadvisor: 'Dr. Juan C. Reyes\\Dr. Sara Ribero'
#altadvisor: 'Dr. Sara Ribero'
department: 'Faculty of Geosciences'
degree: 'Master of Science in Geospatial Technologies'
title: 'Spatio-temporal analysis of extreme wind velocities for infrastructure design'
knit: "bookdown::render_book"
site: bookdown::bookdown_site
output:
#thesisdown::thesis_pdf: default
# citation_package: natbib
thesisdown::thesis_gitbook: default
#thesisdown::thesis_word: default
#thesisdown::thesis_epub: default
#If you are creating a PDF you'll need to write your preliminary content (e.g., abstract, acknowledgments) here or
#use code similar to line 22-23 for the .RMD files. If you are NOT producing a PDF, you can delete or silence lines 21-32 in this YAML header.
abstract: '`r if(knitr:::is_latex_output()) paste(readLines(here::here("prelims", "00-abstract.Rmd")), collapse = "\n ")`'
#If you'd rather include the preliminary content in files instead of inline
#like below, use a command like that for the abstract above. Note that a tab is
#needed on the line after the `|`.
acknowledgements: |
Special thanks to Prof. Dr. **Edzer Pebesma**, first, for all the contributions to the open source community, considering that main work in this thesis was done using his R packages, especially `sf`, `stars` and `gstat`, and second, for all high level knowledge transmitted through the subjects _Spatial Data Science with R_ and _Analysis of Spatio-Temporal Data_, which were the motivation and basis to carry out the investigation.\par
Special thanks to Prof. Dr. **Juan C Reyes** for his contribution in selecting the research topic, and great contributions in information, methodology and support.\par
Gratitude is extended to Dr. **Christoph Brox**, for being a support in difficult moments as the surgery and COVID-19 crisis, and in the same way to **Karsten Höwelhans**.\par
The author is thankful to:\par
Prof. Dr. **Edzer Pebesma**, Prof. Dr. **Juan C. Reyes**, and Prof. Dr. **Sara Ribero**, for supervising this work and spending their valuable time for discussions and feedback, it was really a huge advantage to have that support always available, and a pleasure to work beside them. Dr. **Adam Pintar**, for sharing its related POT-PP R Code, and for devoting much of his time to reviewing and commenting on my progress. The outstanding help of Dr. **Joaquín Huerta Guijarro**, for being receptive, friendly, and decidedly available to help. **European Union** 'Erasmus Mundus Grant', their funding allows me to fulfill this dream to go further with academic and professionals dreams. Engineer **Juan David Sandoval** for its helpful contributions. **Ligia Avellaneda** and **Nicolle Chaely**, mother and daughter of the author, for your love, prayers, and prized advice. Family members as **Elsa Manrique**, **Barbara Avellaneda**, and **Kevin Martinez**, for their really important source of motivation and accompaniment. To all the beautiful people that shared with the author different activities at **San Antonius Church of Münster**, with special mention of father **Alejandro Serrano Palacios** for his outstanding help and friendship which is permanently appreciated, and **choir friends**.
<!-- \clearpage -->
<!-- \shipout\null -->
<!-- \stepcounter{page} -->
dedication: |
\begin{tabbing}
AIS \hspace{4em} \= Colombian Earthquake Engineering Association \\
ASCE \> American Society of Civil Engineers \\
ASCE7-16 \> ASCE/SEI Design Loads Standard \\
CDF \> Cumulative Distribution Function \\
EDA \> Exploratory Data Analysis \\
ECMWF \> European Centre for Medium-Range Weather Forecasts \\
ERA5 \> ECMWF climate reanalysis dataset \\
EVD \> Extreme Value Distribution (GEVD, GEV) \\
GEVD \> Generalized Extreme Value Distribution (EVD, GEV) \\
GEV \> Generalized Extreme Value Distribution (GEVD, EVD) \\
GPD \> Generalized Pareto Distribution \\
HF \> Hazard Function \\
IDEAM \> Institute of Hydrology, Meteorology and Environmental Studies \\
IDW \> Inverse Distance Weighted \\
ISD \> Integrated Surface Database \\
MRI \> Mean Return Interval or Return Period \\
NSR \> Seismic Resistant Norm \\
NOAA \> National Oceanic and Atmospheric Administration \\
NetCDF \> Network Common Data Form \\
NCEI \> NOAA's National Centers for Environmental Information \\
$P_e$ \> Annual Exceedance Probability \\
PDF \> Probability Distribution Function \\
$P_n$ \> Compound Exceedance Probability \\
POT \> Peaks Over Threshold \\
PPF \> Percent Point Function (Quantile) \\
PP \> Poisson Process \\
Poisson-GPD \> POT: 1D PP (time) and GPD (magnitude) \\
POT-PP \> POT: 2D PP (time and magnitude) \\
RL \> Return Level \\
RMSE \> Root Mean Squared Error \\
SEI \> Structural Engineering Institute \\
SQL \> Structured Query Language \\
WGS84 \> World Geodetic System 1984
\end{tabbing}
<!-- \clearpage -->
<!-- \shipout\null -->
<!-- \stepcounter{page} -->
<!-- Preface: | -->
<!-- \clearpage -->
<!-- \shipout\null -->
<!-- \stepcounter{page} -->
#Specify the location of the bibliography below
bibliography: bib/thesis.bib
citation_package: none
biblio-style: "apalike"
#Download your specific csl file and refer to it in the line below.
csl: csl/apa.csl
link-citations: true
#toc: true
toc-depth: 2
lot: true
lof: true
space_between_paragraphs: true
#If you prefer blank lines between paragraphs, un-silence lines 40-41 (this requires package tikz)
#header-includes:
#- \usepackage{tikz}
---
```{r wrap-hook1, include=FALSE}
#To enable size = "value" in chunk options, where
#value can be one of next (sorted from big to small)
#Huge > huge > LARGE > Large > large > normalsize > small > footnotesize > scriptsize > tiny
library(knitr)
def.chunk.hook <- knitr::knit_hooks$get("chunk")
knitr::knit_hooks$set(chunk = function(x, options) {
x <- def.chunk.hook(x, options)
ifelse(options$size != "normalsize", paste0("\n \\", options$size,"\n\n", x, "\n\n \\normalsize"), x)
})
```
```{r setup_source, include=FALSE}
#To add vertical space before source code vspaceecho='2cm'
hook_source_def = knitr::knit_hooks$get('source')
knitr::knit_hooks$set(source = function(x, options) {
if (!is.null(options$vspaceecho)) {
begin <- paste0("\\vspace{", options$vspaceecho, "}")
stringr::str_c(begin, hook_source_def(x, options))
} else {
hook_source_def(x, options)
}
})
```
```{r include_packages, include = FALSE}
# This chunk ensures that the thesisdown package is
# installed and loaded. This thesisdown package includes
# the template files for the thesis.
if(!require(devtools))
install.packages("devtools", repos = "https://cran.rstudio.com")
if(!require(thesisdown))
devtools::install_github("ismayc/thesisdown")
if(!require(here))
install.packages("here", repos = "https://cran.rstudio.com")
library(thesisdown)
```
<!--
Above is the YAML (YAML Ain't Markup Language) header that includes a lot of metadata used to produce the document. Be careful with spacing in this header!
If you'd prefer to not include a Dedication, for example, simply delete the section entirely, or silence (add #) them.
If you have other LaTeX packages you would like to include, delete the # before header-includes and list the packages after hyphens on new lines.
If you'd like to include a comment that won't be produced in your resulting file enclose it in a block like this.
If you receive a duplicate label error after knitting, make sure to delete the index.Rmd file and then knit again.
-->
<!-- On ordering the chapter files:
There are two options:
1. Name your chapter files in the order in which you want them to appear (e.g., 01-Inro, 02-Data, 03-Conclusions).
2. Otherwise, you can specify the order in which they appear in the _bookdown.yml (for PDF only).
Do not include 00(two-hyphens)prelim.Rmd and 00-abstract.Rmd in the YAML file--they are handled in the YAML above differently for the PDF version.
-->
<!-- The {.unnumbered} option here means that the introduction will be "Chapter 0." You can also use {-} for no numbers
on chapters.
-->
```{r eval=!knitr::is_latex_output(), child=here::here("prelims", "00--prelim.Rmd")}
```
```{r eval=!knitr::is_latex_output(), child=here::here("prelims", "00-abstract.Rmd")}
```
<!--# Introduction {.unnumbered}-->
# Introduction
_Extreme value models_ are used for estimating engineering design forces of _extreme events_ like earthquakes, winds, rainfall, floods, etcetera [@Beirlant2004]. Structures designed with these forces, holding a balance between safety and cost, will survive while being requested by an extreme event from a natural phenomenon [@Castillo2005].
This research presents an application of extreme value analysis to estimate wind velocities for infrastructure design. Consequently, the main interest are probable future extreme events that structures need to be able to resist [@Smith2004].
This research follows the methodological approach defined in chapter 26 of the ASCE7-16 standard [@Asce2017]. ASCE7-16 considers design wind velocities for various mean recurrence intervals MRIs, depending on the risk category of the structure, as follows: MRI=700 years for risk category (RC) I and II, 1700 years for RC III, and 3000 for RC IV. A wind speed linked to a _mean recurrence interval - MRI_ of _N-years_ (N-years return period) is interpreted as the highest probable wind speed along the period of N-years [@Asce2017]. The annual probability of equal or exceed that wind speed is 1/N, this is with a change of being equaled or exceeded only one time in the corresponding MRI period.
The development of this research (focused in non-hurricane data), covers three main areas, _downscaling support_, _temporal analysis_, and _spatial analysis_, and includes an integration process with _existing results of hurricane studies_.
Due to the specific characteristics of the study area where there is lack of historical wind measurements, it became necessary to look for alternative data sources: ISD, and ERA5 forecast data. This resulted in a downscaling issue that was confronted from a graphic comparison of all sources by matching stations, in the search of adequate _downscaling support_ for the use of complementary data. Prior to the comparison process, ISD and IDEAM data sources were standardized to represent 3-second wind gust, 10 meters of anemometer height, and terrain open space roughness.
The _temporal analysis_ method used to calculate the return levels at each station from the historical wind time series, is the Peaks Over Threshold POT using a non-homogeneous bi-dimensional Poisson Process PP recommended by @Asce2017 and developed in @Pintar2015. Main components of POT-PP model are de-clustering, thresholding, intensity function fitting, hazard curve, and return levels calculation. At each station with non-thunderstorm data, this model starts with a process of de-clustering choosing a suitable threshold level to leave for the analysis only the most extreme available values, and then, fit to the data an intensity function using maximum likelihood to find optimal parameters with the best goodness of fit. With the fitted model, and using the hazard curve, it was possible to calculate extreme wind velocities or return levels for required MRIs.
The integration of all these results allow to generate non-hurricane continuous maps of extreme winds velocities (using _Kriging_ as _spatial analysis_ method), which are combined with _existing wind extreme hurricane studies_ to be used as input loads for the design of structures of different risk categories, i.e., less risky/important structures for short MRIs (700 and 1700 years), and highly important structures for the longest MRI of 3000 years.
## Context and Background
To design a specific structure, horizontal forces (wind and earthquake) play a starring role. For Colombia, initially, wind forces are calculated considering a fixed velocity value of 100 km/h, later, a continuous map with a return period of 50 years was included in the official design standard. Afterwards, an additional map with a return period of 700 years was added [@nsr10].
In the context of this study, extreme wind analysis is concerned with statistical methods applied to very high values of wind velocity as random variable in a stochastic process, to allow statistical inference from historical data. Extreme analysis methods assess the probability of wind events that are more extreme than the ones previously registered and included in the input model of the maximum wind velocities ordered sample. @Coles2001 presents a detailed study about classical extreme value theory and threshold models. Asymptotic extreme value models arguments give a convenient representation of the stochastic behavior of maximum values [@Coles2003].
According to @Coles2003, there are four main elements needed for a good analysis of extreme values: (a) appropriate selection of an asymptotic model; (b) use of all pertinent and available information; (c) properly estimation of uncertainty; and (d) considering non-stationary effects.
In general, there are two approaches to deal with extreme value analysis [@Pintar2015], the classical approach, and peaks over threshold POT. In this research, POT is selected over classical approach to be able to use more samples for statistical estimation.
The classical approach or traditional method, as well known as _sample maxima_ or _yearly maxima_ is associated to a generalized extreme value distribution GEV [@Fisher1928; @Gnedenko1943]. GEV is a family of limit probability distributions including Gumbel, Fréchet and Weibull, unified in @Mises1954 and @Jenkinson1955. The GEV family _describes all limiting distributions of the centered and normalized sample maximum_ [@Coles2003].
POT models the values above a chosen high level, and in general, the POT method has two approaches [@Smith2004]: (a) exceedances over threshold associated to a Generalized Pareto Distribution GPD, onwards _POT-Poisson-GPD_; and (b) the exceedances over threshold associated to a non-homogeneous non-stationary bi-dimensional Poisson process POT-PP (a point process approach). POT-PP is considering to be more flexible than generalized Pareto approach [@Coles2001], so for this study, POT-PP method is selected.
Selection of threshold level is relevant in POT. A low threshold (more exceedances) implies less variance and weak asymptotic support, but high bias. A high threshold (fewer exceedances) implies more variance and stronger asymptotic support, but low bias.
POT-Poisson-GPD models wind magnitudes over the threshold as a GPD, and time as a separated Poisson process [@PickandsIII1975]. This method was used for the first time as statistical application in @Davison1990. The generalized Pareto family describes all possible limiting distributions of the distributions scaled above the threshold [@Coles2003].
In POT-PP time and magnitude above the threshold are modeled using a two-dimensional Poisson process [@Pickands1971]. This method was applied for the first time in @Smith1989.
There are many techniques to estimate the parameters of extreme value models, i.e. graphical methods, estimators based on moments, order statistics, and likelihood based [@Coles2003]. @Smith1985 supported the use of likelihood methods due to _asymptotic normality_ guaranteed when shape parameter is greater than 0.5, excluding light tailed distributions with finite end point. In addition, likelihood method is easy to evaluate and solve numerically, and the calculation of standard errors and confidence intervals is possible using asymptotic theory.
## Problem Statement and Motivation
Wind forces are important for infrastructure design [@windeffects]. For a civil engineer main forces to consider for the design of a structure, for instance a bridge or a building, are (a) gravitational forces which can be dead (weight of the structure) or live (usage loads, for instance people living in a building or crossing a bridge), and (b) lateral forces due to earthquake and wind. For Colombia, the structure design standard has defined in great detail all aspects related to seismic and gravitational forces, but lack of detail at wind design speeds. Current wind velocities map is 20 years outdated and is not appropriate for all types of structures, because it only includes two return periods.
It is well known that in recent years there have been accelerated changes in the climate of the planet, including issues related to winds. This aspect is reflected in frequent partial failures of structures due to wind forces [@winddamage], and in some cases including with total losses [@Rezapour2014]. Last five decades the way to assess wind loads in structural design has had remarkable changes [@Roberts2012].
A complete study of extreme wind forces, need to address separately hurricane and non-hurricane data, to include in the product the integration of results from both fronts [@Asce2017]. In the study area, hurricane winds are only present inland in the Caribbean Sea, therefore, only affects directly 'San Andres y Providencia' island - one (1) of thirty-three (33) states. In 1102 of 1103 municipalities (more than 99%), only non-hurricane winds are relevant. However, all municipalities located near to the northern onshore border may be impacted by side effects of hurricanes.
The national infrastructure design standard of Colombia, maintained by the Earthquake Colombian Association of Seismic Engineering AIS, uses km/h as official units for wind velocities. In this research km/h is always used, considering that output results will support the update of chapter B.6 (wind forces) of mentioned standard.
## Knowledge Gap
Nowadays, methodologies to deal with the inference of extreme wind maps are quite mature and advanced, and many of them are already implemented and ready for use. For this reason, the main contribution of this research is not related to the theoretical foundations of the methods themselves, but to application of the method in a particular case where good quality data is not available [@windassessment]. Thereby, the gaps in which this research aims to contribute are related to the use of alternative data sources, and how to meet the downscaling challenge considering the lack of field measurement data coming from weather stations.
## Research Aim and Objectives
The main aim of this research is the estimation of wind extreme velocities to be used as input loads for the design of structures, considering their risk categories, and covering any place in the whole study area.
Specific objectives are:
1. Complement the lack of field measured wind data, with other sources of information, then, analyze and compare different time series, to select and use the best data source (or combination of sources), in case of downscaling support issue.
2. Select and apply a suitable probabilistic method to infer wind maps for infrastructure design.
3. Estimate extreme wind values for the stations in the selected input data source, considering non-hurricane approaches.
4. Allow the comparison of wind extreme values estimations, using different methods to verify and calibrate output results.
5. Generate continuous non-hurricane wind maps, using the most suitable spatial interpolation technique, considering the specific characteristics of the input data source.
6. Combine output maps from non-hurricane analysis with existing hurricane studies to obtain final maps for structural purposes.
## Research Question
Main question of this research is directed to calculate future wind extreme velocities (return levels) for infrastructure design, then the research is:
**What wind extreme velocities need to be used as load design forces for structures of different use category in the study area?**
## Case Study
As mentioned before, case study in this research is Colombia a tropical country located in the northern part of South America. Its capital Bogotá is located in the center of the country at latitude $4.6^\circ N$, and longitude $74.1^\circ W$.
Despite the government Institute of Hydrology, Meteorology and Environmental Studies IDEAM maintains a network of weather stations, of which around 200 have anemometers measuring instant data every minute, it was impossible to obtain quality and complete measured data according to the needs of the present investigation. This motivated to search for alternative data sources.
Nowadays in Colombia there are predefined requirements to design structures depending of its use category. The national standard for infrastructure design [@nsr10], following the American design standard [@Asce2017], covers the design of all types of structures with the mean recurrence intervals MRIs 700, 1700, and 3000 years. In this way this research aims to calculate wind extreme velocities that will be equaled or exceeded with a probability equal to $\frac{1}{MRI}$ in a given year, in other words, the velocities that will be equaled or exceeded only one time in mentioned periods. In terms of exposure time, understood as the time the structure will be in use, when the exposure time will be equal to those MRIs, the wind extreme velocities will have an occurrence compound probability of 67%.
Historically, hurricanes have only affected the Colombian Caribbean coast in a not very significant way, despite the fact that there have been significant events that have made landfall. The most likely areas to be affected by storms are the department of _La Guajira_ and the island of _San Andrés_ [@Royero2011]. In most cases the events that define the wind design loads in Colombia do not require hurricane data.
## Outline
Main sections of thesis document are 1) Introduction, 2) Data, 3) Theoretical Framework, 4) Methodology, 5) Results and Discussion, 5) Conclusions, and 6) Annexes (from A to E).
After introduction, in second section _[Data](#rmd-data)_, main information about data sources IDEAM, ISD, and ERA5 are described, including at the end additional details for ERA5 in Annex \@ref(datadownload). Annex \@ref(dbstoring) explains reasons for using PostgreSQL engine, and the database backup and restoration process.
_[Theoretical Framework](#rmd-thefra)_ section is dedicated to introduce statistical concepts that are basis for the investigation, both in **probability distributions** and in **extreme analysis**. Later, it is described in more detail, topics related to **extreme value analysis** (peaks over threshold with generalized Pareto POT-Poisson-GPD, and peaks over threshold with Poisson process POT-PP), and at the end, a summary report is done about **wind load requirements** for the study.
The _[Methodology](#rmd-method)_ chapter includes the processes needed to meet the objectives and answer the research question. Main components are data standardization, downscaling support, POT-PP, spatial interpolation, and integration with hurricane data.
_[Results and Discussion](#rmd-results)_ section shows, (1) all results for data standardization and comparison to support the downscaling issue, (2) all POT-PP results for one ISD station, (3) all output maps for ISD and ERA5 data sources including discussions of those finals results. These discussions are complemented by the _[Conclusions](#conclusions)_ section.
To finalize the document a series of appendices were created to facilitate the reproducibility of the research. Appendix \@ref(rcode) contains _research R code_. It is necessary to consider that the code provided by _Dr. Adam L. Pintar_ to do the de-clustering and thresholding in POT-PP is not there because its publication and distribution is not authorized. Appendix \@ref(results) contains all _results in digital format_. Appendix \@ref(datadownload) compliments the information needed to _download ERA5_ data. Appendix \@ref(dbstoring) shows the use of PostgreSQL for data storage and provides instructions for backup and restoring. Because the document for the thesis was done using package `thesisdown` [@Ismay2020], which is based on `bookdown` [@Xie2016; @Xie2020], the most important _document R code_ to create the document (mainly graphics) is shown in Appendix \@ref(docrcode). Finally, in Appendix \@ref(manual) a _user manual_ is presented in order to provide instructions to apply the same methodology in a different case study.