03-chap3.Rmd

```{r include_packages_2, include = FALSE}
# This chunk ensures that the thesisdown package is
# installed and loaded. This thesisdown package includes
# the template files for the thesis and also two functions
# used for labeling and referencing
if(!require(devtools))
  install.packages("devtools", repos = "http://cran.rstudio.com")
if(!require(dplyr))
    install.packages("dplyr", repos = "http://cran.rstudio.com")
if(!require(ggplot2))
    install.packages("ggplot2", repos = "http://cran.rstudio.com")
if(!require(bookdown))
    install.packages("bookdown", repos = "http://cran.rstudio.com")
if(!require(openair))
    install.packages("openair", repos = "http://cran.rstudio.com")
if(!require(raster))
    install.packages("raster", repos = "http://cran.rstudio.com")
if(!require(rgdal))
    install.packages("rgdal", repos = "http://cran.rstudio.com")
if(!require(shape))
    install.packages("shape", repos = "http://cran.rstudio.com")
if(!require(gridExtra))
    install.packages("gridExtra", repos = "http://cran.rstudio.com")
if(!require(grid))
    install.packages("grid", repos = "http://cran.rstudio.com")
if(!require(cowplot))
    install.packages("cowplot", repos = "http://cran.rstudio.com")
if(!require(thesisdown)){
  library(devtools)
  devtools::install_github("ismayc/thesisdown")
  }
library(devtools)
library(thesisdown)
library(dplyr)
library(ggplot2)
library(bookdown)
library(openair)
library(raster)
library(rgdal)
library(shape)
library(gridExtra)
library(grid)
library(cowplot)
```

# Methodology {#rmd-method}

This research is focus in non-hurricane data, with three main elements: *data*, *temporal analysis* with POT-PP, and *spatial analysis* to do spatial interpolation and create return levels RL maps for MRIs of 700, 1700, and 3000 years. Core steps (1, and 3 to 7) need to be done in an iterative process station by station as is shown in Figure \@ref(fig:mainmethodology). 

```{r mainmethodology, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Iterative Process in Methodology", fig.align="center", dpi=208}
include_graphics(path = "figure/main_methodology.png")
```

Figure \@ref(fig:methodology) shows the methodological scheme where the main elements mentioned are highlighted using shaded boxes. Steps 1 to 8 are the most representative, but step 2 is a data verification process that can be done once (in bulk) for all stations.

```{r methodology, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Methodology", fig.align="center", dpi=300}
include_graphics(path = "figure/methodology.png")
```

Once the iterative cycle ends and the RL are calculated in all the stations, continuous surfaces will be created, one for 700 years, next for 1700 years, and finally for 3000 years. An additional element, is the integration with existing hurricane studies to produce final maps, that will be used as input loads for infrastructure design, and will be part of the design standard

## Data Standardization {#rmd-standardization}

Analysis of extreme wind speeds requires data standardization as initial step. All input data must be standardized to represent three important conditions: a) anemometer height of 10 meters, b) open space terrain roughness (exposition C), and c) averaging time of 3-seconds wind gust. @Asce2017 defines exposition C as areas with few obstructions, and exposition D refers to perfect open space.

Parallel to the standardization activity described below, it is also important to consider for all stations involved in the analysis:

* _Separating_: As far as possible, identify each record of the time series, as thunderstorm (t) or non-thunderstorm (nt)

* _Filtering_: Remove wind speeds above $200 \frac{km}{h}$ and data pertaining to hurricane events, because the procedure with hurricane requires a different approach and need to be done independently

### Anemometer Height (10 m)

According to the protocol for field data collection and location of methodological stations [@ideam2005], the anemometer (wind sensor) in installed always to a fixed height of 10 meters from the surface, as is shown in Figure \@ref(fig:anemometer); therefore, no height correction is needed.

```{r anemometer, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Anemometer height: 10 meters", fig.align="center", fig.width=3, fig.height=2.7, }
#include_graphics(path = "figure/anemometer.png")
par(mar=c(0,0,0,0))
par(oma=c(0,0,0,0))
plot(1, type="n", xlab="", ylab="",
     xlim=c(0,5), ylim= c(0,12), xaxt ="n", yaxt="n", bty="n", bg = 'transparent')
lines (x=c(0.9,0.9), y=c(0,12))
lines (x=c(1.1,1.1), y=c(0,12))
lines (x=c(0.9, 1.1), y=c(12,12))
lines (x=c(0, 4), y=c(0,0), lwd=2)
rect(0.85, 9.7, 1.15, 10.3, col="white")
lines (x=c(1.15, 2.5), y=c(9.85,9.85))
lines (x=c(1.15, 2.5), y=c(10.15,10.15))
lines (x=c(2.5,2.5), y=c(9.9,10.1))
rect(2.4, 9.8, 2.6, 10.2, col="white")
lines (x=c(2.45,2.45), y=c(10.2,10.9))
lines (x=c(2.55,2.55), y=c(10.2,10.9))
rect(2.4, 10.8, 2.6, 11.2, col="black")
rect(2, 10.9, 2.5, 11.1, col="black")
rect(2.5, 10.9, 3, 11.1, col="black")
points(x=2, y=11, pch=19, cex = 3, col="black")
points(x=3, y=11, pch=19, cex = 3, col="black")
lines (x=c(3.5,3.5), y=c(0,11), lty=5)
arrows(3.5, 0, 3.5, 0.5, length=0.1, code=1)
arrows(3.5, 11, 3.5, 10.5, length=0.1, code=1)

lines (x=c(2,4), y=c(11,11), lty=5)
myx = seq(from=0, to=4, by=0.2) 
for (i in myx) lines (x=c(i, i-0.2), y=c(0, -0.2))
text(3.85, 0, labels="Surface", pos=4, cex=0.7)
text(3.5, 5.5, labels="10 meters", pos=4, cex=0.7)
text(3.85, 11, labels="Anemometer", pos=4, cex=0.7)
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```

### Surface Roughness at Open Terrain {#rmd-roughness}

Due to the effects that the terrain has on wind speed, a correction should be applied if the station is located in a geographical space considered "not open terrain". When terrain is open, the roughness corresponds to 0.03 meters. There are some alternative methodologies to calculate the roughness; for example, @Masters2010 uses the station data, but the separation of the measurements should not exceed one minute (something difficult to obtain), @Lettau1969 uses the empirical Equation \@ref(eq:zo) (recommended in ASCE7-16 page 743, equation C26.7-1) to calculate roughness $z_o$, which was used here,

\begin{equation}
        z_0= 0.5\;H_{ob}\;\frac{S_{ob}}{A_{ob}}
  (\#eq:zo)
\end{equation}

where $H_{ob}$ is the average height of the obstacles, $S_{ob}$ is the average vertical area perpendicular to the wind of the obstacles, and $A_{ob}$ is the average area of the terrain occupied by each obstruction. The empirical exponent $\alpha$, gradient height $z_g$, and exposure coefficient $K_z$, are used to calculate the correction factor $F_{exposition}$, for $z_0$ units are in meters.

\begin{equation}
        \alpha =  5.65\;z_0^{-0.133}
  (\#eq:alpha)
\end{equation}


\begin{equation}
        z_g=450\;z_0^{0.125}
  (\#eq:zg)
\end{equation}


\begin{equation}
        K_z= 2.01\left(\frac{z}{z_g}\right)
  (\#eq:kz)
\end{equation}

\begin{equation}
        F_{exposition} = \frac{0.951434}{K_z}
  (\#eq:fexpo)
\end{equation}


According to @nist2012, calculation of roughness needs to be weighted according to the predominance of wind magnitude in eight directions (north, south, east, west, north-east, north-west, south-east, and south-west) around the station location. The calculation in each direction can be done using a detailed aerial photo or satellite image of the station, including a radius of 800 meters. Figure \@ref(fig:compassrose) shows the wind percentages in mentioned directions for a generic station. Figure \@ref(fig:lettaustation2) shows the satellite image for _Vanguardia_ ISD station (USAF:802340), located in _Villavicencio_ airport, with four (south, north, east, and west) 45 degree sectors highlighted


```{r compassrose, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Wind Rose with Wind Percentages", fig.align="center", fig.height= 2.3}
#include_graphics(path = "figure/viensanandres.png")
par(mar=c(0,0,0,0))
par(oma=c(0,0,0,0))
data(mydata)
#windRose(mydata)
windRose(mydata, angle = 45, width = 1, 
              grid.line = 12, ws.int=5, auto.text=F, annotate=F, key.header = NULL, key.footer = "", key=FALSE)
```

```{r lettaustation1, include = FALSE}
#include_graphics(path = "figure/aerial_photo_pintar.png")
circle = readOGR(dsn = "data/circle.shp", layer = "circle")
sectors = readOGR(dsn = "data/sectors.shp", layer = "sectors")
#myraster = raster ("data/802340_img_nopoint_modified.tif")
#myraster.crop = crop(myraster, extent(circle))
#myraster.mask = mask(myraster.crop, circle)

img_stack=stack("data/802340_img_nopoint_modified.tif")
img_stack.crop = crop(img_stack, extent(circle))
img_stack.mask = mask(img_stack.crop, circle)
```

```{r lettaustation2, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Digital Imagery for 'Vanguardia' ISD Station (USAF:802340)", size="footnotesize", fig.align="center", fig.width= 1.6, fig.height= 1.6}

yellowtrans <- rgb(255, 255, 0, 30, maxColorValue=255)
plotRGB(img_stack.mask)
plot(sectors, add=T, col=yellowtrans)
lines(sectors, col="black", lwd=0.5)
lines(circle, col="black", lwd=0.8)
points(x=-73.614, y=4.168, pch=17, cex=1, col="red")
text(x=-73.614, y=4.168, labels="802340", pos=4, cex = 0.8, col="white")

```

Figure \@ref(fig:lettauexamples2) shows extreme conditions for roughness, open space in left image (ISD Station 804070) with roughness value of 0.03, closed space in center image (ISD Station 803000) with roughness value of 0.1, and a typical example where mentioned Lettau procedure is needed because roughness is different in each direction, in right image. Lettau Equation \@ref(eq:zo) need to be applied to each direction and then the final $z_o$ value is the weighted average, using historical wind percentage. See Figure \@ref(fig:lettauvalues) showing the strokes made to calculate the different areas for two Colombian stations, in red the area occupied by the obstacles, and in blue the perpendicular area [@triana2019]. Information about wind percentage per direction at each station were obtained from @ideam1999.


```{r lettauexamples1, include=FALSE}
#include_graphics(path = "figure/lettauexamples.PNG")

circle804070 = readOGR(dsn = "data/circle_804070.shp", layer = "circle_804070")
sectors804070 = readOGR(dsn = "data/n_sector_804070.shp", layer = "n_sector_804070")
img_stack804070=stack("data/804070_img_modified.tif")
img_stack804070.crop = crop(img_stack804070, extent(circle804070))
img_stack804070.mask = mask(img_stack804070.crop, circle804070)

circle803000 = readOGR(dsn = "data/circle_803000.shp", layer = "circle_803000")
sectors803000 = readOGR(dsn = "data/s_sector_803000.shp", layer = "s_sector_803000")
img_stack803000=stack("data/803000_img_modified.tif")
img_stack803000.crop = crop(img_stack803000, extent(circle803000))
img_stack803000.mask = mask(img_stack803000.crop, circle803000)

circle804380 = readOGR(dsn = "data/circle_804380.shp", layer = "circle_804380")
sectors804380 = readOGR(dsn = "data/e_sector_804380.shp", layer = "e_sector_804380")
img_stack804380=stack("data/804380_img_modified.tif")
img_stack804380.crop = crop(img_stack804380, extent(circle804380))
img_stack804380.mask = mask(img_stack804380.crop, circle804380)
```


```{r lettauexamples2, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Roughness. Open (L), Closed (C), and Lettau (R).", fig.align="center", fig.height=1.5}

par(mfrow=c(1,3))
par(pty="m")
par(mar=c(0,0,0,0))
par(oma=c(0,0,0,0))
par(xpd = NA)

yellowtrans <- rgb(255, 255, 0, 30, maxColorValue=255)
plotRGB(img_stack804070.mask, pty="s")
plot(sectors804070, add=T, col=yellowtrans)
lines(sectors804070, col="black", lwd=0.5)
lines(circle804070, col="black", lwd=0.5)
points(x=-71.728, y=10.558, pch=17, cex=1, col="red")
text(x=-71.728, y=10.558, labels="804070", pos=1, cex = 1.3, col="white")

plotRGB(img_stack803000.mask, pty="s")
plot(sectors803000, add=T, col=yellowtrans)
lines(sectors803000, col="black", lwd=0.5)
lines(circle803000, col="black", lwd=0.5)
points(x=-77.9, y=2.583, pch=17, cex=1, col="red")
text(x=-77.9, y=2.583, labels="803000", pos=3, cex = 1.3, col="white")

plotRGB(img_stack804380.mask, pty="s")
plot(sectors804380, add=T, col=yellowtrans)
lines(sectors804380, col="black", lwd=0.5)
lines(circle804380, col="black", lwd=0.5)
points(x=-71.161, y=8.582, pch=17, cex=1, col="red")
text(x=-71.161, y=8.582, labels="804380", pos=2, cex = 1.3, col="white", halo=TRUE, hc="black", hw=0.2)
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```

```{r lettauvalues, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Lettau Calculation", fig.align="center", dpi=150}
include_graphics(path = "figure/lettauvalues.png")
```

### Averaging Time: 3-s Gust {#rmd-gust}

To transform hourly mean wind velocity $V_{3600}$, to 3-s gust velocity $V_3$, ASCE7-16 recommends the use of @Durst1960. In curve Durst the axis $x$ represents the duration $t$ of the gust, what is done is to look there for the value 3 seconds, and read the corresponding gust factor $\frac{V_t}{V_{3600}}$ in axis $Y$. For instance, using variable $V_{3600}$ from IDEAM data source, the gust factor for 3-s gust is 1.51.

\begin{equation}
        V_t = V_{3\,\textrm{seconds}} = \textrm{(gust factor)}\,V_{3600}
  (\#eq:vt)
\end{equation}

It is valid only for open terrain conditions. Durst curve shows in axis $y$ the gust factor $\frac{V_t}{V_{3600}}$, a ratio between any wind gust averaged at $t$ seconds, $V_t$, and the hourly averaged wind speed $V_{3600}$, and in the axis $x$ the duration $t$ of the gust in seconds.

## Downscaling Support {#ds}

where it is necessary to complement the local/regional wind analysis, with data from ISD (output data of a model for extreme winds), and ERA5 reanalysis dataset (large scale forecast data), it is required to probe by means of _comparisons_ (exploratory data analysis and/or statistical measures) that those sources (modeled and forecast) are similar to IDEAM field measurements.

The proposed mechanism in the search for downscaling support is, (a) the creation of _common time series graphs_, where time series from all data sources are expected to be similar, and (b) the elaboration of _scatter plots graphics_, which are generated matching two sources in time (sorted in ascending order by wind velocity). By visual inspection is possible to evaluate data similarity between  sources, when all the points fall very close to a 45-degree line. In both cases, the strategy for station matching, could be one of the following:

1. _Manual matching_,  doing a detailed analysis station by station (only for ISD and IDEAM). While it is true that ISD is based on IDEAM, their names and locations are somewhat different, for this reason, it is necessary to read information available from each source, and decide station by station, about its correspondence.

2. _Intersection matching_, between ISD and IDEAM point stations and ERA5 cells. All ISD and IDEAM stations falling inside a ERA5 cell, will be compared between them.


## Temporal Analysis (POT-PP) {#pot-pp}

Similar to how the adjustment of statistical data to a normal distribution is done to make inferences, in extreme value analysis only some part of the data (those that are extreme - over a high threshold - POT) needs to be fitted to a PP considering extreme deviations from the mean. While in the first case (normal distribution) the inferences are for events similar to the samples, in this case, when working with extreme value theory, the inferences will be for more extreme events than any previously observed or measured.

In summary, POT means only to work with extreme values, and PP means to adjust data to a PDF, which depends on an intensity function $\lambda(t,y)$, where $t$ is time, $y$ is wind extreme velocity. As shown in Figure \@ref(fig:plotdomainpp), in a POT-PP approach with domain $D$, all the observations follow a Poisson distribution with mean $\int_D\lambda(t,y)\,dt\,dy$. Main advantage of POT-PP is that it is designed to consider storm and not-storm events independently (for each disjoint sub-domain $D_1$ or $D_2$ inside $D$, the observations in $D_1$ or $D_2$ are independent random variables), but in the end use them both for the inferences,

\begin{equation}
          PDF = f(t,y|\eta) = \frac{\lambda(t,y)}{\int_D\lambda(t,y)\,dt\,dy}
  (\#eq:pppdf)
\end{equation}


### De-clustering {#decluster}

To make the assumptions of PP more justifiable, it is important to have only one sample per event: the highest one. For instance, if a hypothetical storm started at 11:30 in the morning and ended at 12:30 in the afternoon, and the time series for that event has thirty wind measurements (one each two minutes), it is necessary to leave only the stronger or maximum value; this process is called de-clustering. In Figure \@ref(fig:declustering), two thunderstorm clusters are shown, and only red samples are used to fit the PP. POT-PP defines that all the adjacent observations separated by six hours (6) or less in the case of thunderstorm events, and four (4) days or less, in the case of non-thunderstorm events, belong to the same cluster.

```{r declustering, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="De-clustering in PP", size="footnotesize", fig.width= 2, fig.height= 2, fig.align="center"}
par(bg=NA)
par(xpd = NA)
op <- par(mar = rep(0, 4))
#par(pin = c(5, 1))
plot(1, type="n", xlab="", ylab="",
     xlim=c(0,4), ylim= c(0,0.5), xaxt ="n", yaxt="n", bty="n", bg = 'transparent')
axis(1, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0.05, tck=0.04, lwd=0.1)
#axis(2, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0.5)
x=rnorm(8, mean = 1, sd = 0.5)
y=rnorm(8, mean = 0.25, sd = 0.1)
points(x, y, col="cornsilk4", pch=20)
points(0.5, 0.5, col="red", pch=20, cex=1.3)
abline(v=2, lty="dotted")

x=rnorm(8, mean = 3, sd = 0.5)
y=rnorm(8, mean = 0.25, sd = 0.1)
points(x, y, col="cyan3", pch=15, cex=0.7)
points(3.3, 0.4, col="red", pch=15, cex=0.9)
```


### Thresholding {#thresholding}

As the POT model requires to work only with the most extreme values in the time series, it is necessary to select a threshold to filter out small values. Bias is high when a low threshold is selected (many exceedances) because the asymptotic support is weak; opposite situation happens for high thresholds where variance is potentially high. According to @Davison1990, it is necessary to select a threshold value, consistent with model structure. Figure \@ref(fig:plotdomainpp) represents the thresholding process in a generic dataset, where only points above (red squares) the dotted horizontal line (the threshold) will be used for the model.


```{r thresholding, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="POT - Thresholding", size="footnotesize", fig.width= 2, fig.height= 2, fig.align="center"}
par(bg=NA)
par(xpd = NA)
op <- par(mar = rep(0, 4))
#par(pin = c(5, 1))
plot(1, type="n", xlab="", ylab="",
     xlim=c(0,4), ylim= c(0,0.8), xaxt ="n", yaxt="n", bty="n", bg = 'transparent')
axis(1, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0.05, tck=0.04, lwd=0.1)
#axis(2, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0.5)
x=rnorm(7, mean = 2, sd = 1)
y=rnorm(7, mean = 0.6, sd = 0.2)
points(x, y, col="red", pch=15, cex = 0.7)

abline(h=0.3, lty="dotted")
x=rnorm(15, mean = 2, sd = 2)
y=rnorm(15, mean = 0.2, sd = 0.03)
points(x, y, col="black", pch=20, cex = 1)
```


The procedure to choose the best thresholds pairs, one for thunderstorm, and other for non-thunderstorm, is based on the W transformation. POT-PP needs selection of the best threshold pairs $b_t$ and $b_{nt}$ (see Figure \@ref(fig:plotdomainpp)) that produces the optimal fit. Measurement of this threshold fitting is done through $W$ statistic. If wind variable $y$, in a POT-PP approach, has a $CDF = U = F(y)$, then $F(y)$ is distributed as uniform between 0 and 1 _uniform(0,1)_, meaning that the transformation $W = -log(1-U)$ is an exponential random variable with mean one (1).

\begin{equation}
            CDF = U= F(y) = P(y \leq Y) = \frac{\int_b^Y\lambda(y,t)\,dy}{\int_b^\infty\lambda(y,t)\,dy} 
  (\#eq:ppcdf)
\end{equation}


W-statistic is done comparing the ordered result of applying $W = -log(1-U)$ to the data (the axis $y$ in Figure \@ref(fig:wstatistics)) with the theoretical quantiles of an exponential variable with uniform distribution between 0 and 1 (axis $x$ in same figure). W-statistic is the highest vertical distance between the 45º line and the points in the graphic. The best thresholds pairs return the minimum value for W-statistics after testing, in an iterative process with many threshold pairs combinations.

```{r wstatistics, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="POT - Thresholding W Statistic", size="footnotesize", fig.align="center", fig.width=2.5, fig.asp=1}
dat <- readRDS("data/myprint6.rds")
dat
lines(x=c(3.1, 3.1), y=c(3.1,3.8), col="green", lty=5)
points(x=3.1, y=3.8, pch=".")
text(x = 3.1, y = 3.4, labels = c("Minimum W distance"), cex=0.6, pos = 2)
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```

### Exclude No-Data Periods

PP requires to remove long periods of time when stations were not recording or failing. Proposed time in @Pintar2015 is 180 days, namely, to remove all the gaps from the time series larger than six months.

### Fit Intensity Function

Probability density function PDF, and cumulative distribution function CDF, of the PP, depend of the intensity function, and are shown in Equation \@ref(eq:pppdf), and Equation \@ref(eq:ppcdf), respectively.

To facilitate the estimation of the parameters for the PP intensity function, parameter $shape = \zeta_t$ is taken to be zero in Equation \@ref(eq:ppintensityfunction), then doing the limit, the resulting intensity function is the same as the GEV type I or Gumbel distribution,

\begin{equation}
          \frac{1}{\psi_t}\exp\left\{\frac{-(y-\omega_t)}{\psi_t}\right\}
  (\#eq:ppusedif)
\end{equation}

In this study, used intensity functions are:

\begin{equation}
    \lambda\left(y,t\right)=
    \begin{cases}
      \begin{aligned}
            &\frac{1}{\psi_s}\exp\left(\frac{-(y-\omega_s)}{\psi_s}\right),\;\textrm{for t in thunderstorm period}
            \\
            &\frac{1}{\psi_{nt}}\exp\left(\frac{-(y-\omega_{nt})}{\psi_{nt}}\right),\;\textrm{for t in non-thunderstorm period}    
      \end{aligned}
    \end{cases}
  (\#eq:ppspecificintensityfunction)
\end{equation}

As is shown in \@ref(fig:fitif), the fitting process involve finding the best group of parameters of the intensity function, in such a way that the red curve (PDF of the PP, based on intensity function) be as tight as possible to the shape of the data histogram. As is described in _[POT-PP](#pot-pp)_, optimal parameters to do the fitting process of the intensity function are calculated using *maximum likelihood* with equation \@ref(eq:pplikelihood). The R code provided by Dr. Adam Pintar solve the score equations for the limit of the Poission process likelihood as the shape parameter goes to zero,  see personalized functions `FindStartVals` and `FindPsi` based on `stars::uniroot`.


```{r fitif, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="POT - PP Intensity Function Fitting Process", size="footnotesize", fig.width= 2, fig.height= 2, fig.align="center"}
par(bg=NA)
par(xpd = NA)
op <- par(mar = rep(0, 4))
#par(pin = c(5, 1))
y=evd::rgumbel(10, loc = 2, scale = 1)
hist(y, probability = TRUE, xlab="", ylab="", xaxt ="n", yaxt="n", main="", lwd=0.5)
curve(evd::dgumbel(x, loc=2, scale=1), col = "red", add=TRUE, lwd=2)
```


### Hazard Curve and Return Levels RL

A hazard curve is shown in Figure \@ref(fig:hc), where axis $x$ represents annual exceedance probability $P_e = \frac{1}{N}$, and axis $y$ represents the return level RL $Y_N$ for the corresponding N-years return period. It is possible to obtain the extreme return wind velocity level for any given return period going from axis $x$ to axis $y$ through the curve.


```{r hc, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="POT - PP Hazard Curve", size="footnotesize", fig.width= 2, fig.height= 2, fig.align="center"}
par(mar=c(2,1,0,0))
par(oma=c(0,0,0,0))
par(bg=NA)
#par(xpd = NA)
#op <- par(mar = rep(0, 4))

plot(1, xlab='', ylab='', type='n', yaxt='n', xaxt='n', tck=0,
     xlim=c(0,300), ylim=c(0,0.025), bg = 'transparent')

text(x = 150,  y = par("usr")[3] - 0.003,
     labels = expression(frac(1, N)),
     xpd = NA,
     ## Rotate the labels by 35 degrees.
     srt = 0,
     cex = 0.7)

text(x = par("usr")[4] - 30,  y = 0.0215,
     labels = expression(Y[N]),
     xpd = NA,
     ## Rotate the labels by 35 degrees.
     srt = 0,
     cex = 0.7)

location = 100
scale = 40
.x <- seq(0, 1500, length.out=1000)
hfG <- function(x) {
  (1/scale)*(exp(-(x-location)/scale))/(exp(exp(-(x-location)/scale))-1)
}

curve(hfG, add=T, col="red", lwd=2)

Arrows (x0=150, y0=0, x1=150, y1=(hfG(150)-0.003), arr.type="triangle", arr.width=0.08, lwd=0.1)
Arrows (x0=150, y0=hfG(150), x1=10, y1=hfG(150) , arr.type="triangle", arr.width=0.08, lwd=0.1)
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```


In this research POT-PP includes only time series classified as non-thunderstorm, and this implies that the intensity function to be used (Equation \@ref(eq:ppspecificintensityfunction)) does not differentiate between wind types (thunderstorm and non-thunderstorm), i.e., the intensity function is not a function of time t. 

Hazard curve can be created solving $Y_N$ in Equation \@ref(eq:pprl) for a specific value of N in years (MRI). As a bad estimation caused by the deficiencies in the available information for the case study, the average duration time of non-thunderstorm events by year is considered to be one year, i.e., the parameter $A_{nt}$ in Equation \@ref(eq:pprl) is equal to one, and $A_t$ is equal to zero (units in years).

Considering that the intensity function is not a function of time, equation \@ref(eq:pprv) can be used replacing directly the parameters of the intensity function (PP) and the return periods (N), to create the hazard curve and get RL:

\begin{equation}
Y_N=\frac{\psi}{\zeta}\left[-log\left(\frac{N-1}{N}\right)\right]^{-\zeta}-\frac{\psi}{\zeta}+\omega
  (\#eq:pprv)
\end{equation}

As for this research $\zeta = 0$ in selected intensity function (Equation \@ref(eq:ppspecificintensityfunction)), return levels $Y_N$ can be calculated with the Gumbel quantile function using $(1-\frac{1}{N})$ as probability. This alternative approach is only valid when the analysis of POT-PP includes only one type of event (thunderstorm or non-thunderstorm). The connection between the intensity function of PP and the Gumbel function (variant of the GEV) is described in @Johnson1995, p. 75.

## Spatial Interpolation {#si}

Probabilistic (Kriging) and deterministic (IDW, local polynomials) techniques are used to create maps for return levels with same return period. Interpolation with Kriging requires verification of minimum technical requirements to ensure proper use of the method, particularly:

* Structural analysis, which includes data normality check, for example with Kolmogorov Smirnov or Shapiro Wilk goodness of fit tests, and if needed, data transformation to ensure data normality, e.g. using Box-Cox, and in addition, trend analysis to verify the need for trend modeling in subsequent steps.

* Semivariance Analysis: Use of available tools like cloud semivariogram, experimental semivariogram, directional semivariograms to verify isotropy or anisotropy, and different theoretical semivariograms, to ensure the best model for spatial autocorrelation, as a preliminary step to interpolation.

* Kriging Predictions: Use of different types of Kriging predictors, like simple, ordinary, universal, based on the results of the structural analysis.

* Cross Validation: Use of statistics like root mean square, average standard error, mean standardized, and root mean square standardized, that allow to measure the quality of the predictions and the magnitude of the errors.

Possible advantage of deterministic methods, is a better assessment of the local variability of spatial autocorrelation. It can also be considered with IDW or local polynomials a detailed assessment of structural analysis and cross validation. At the end of the spatial interpolation analysis all the predictions can be compared to select the most suitable result.

Main references in this research related to this matter using *R software* are @Pebesma2019, @Pebesma2004, and @Graeler2016. For the implementation of spatial statistics using vector or raster format, see @Pebesma2019a, @Pebesma2019b and @Pebesma2018.

## Integration with Hurricane Data {#integration}

ASCE7-16 proposes the equation C26.5-2 for combination of statistically independent events, of non-hurricane and hurricane wind speed data. 

\begin{equation}
          P_e(y>Y_N) = 1 - P_{NH}(y<Y_N)P_{H}(y<Y_N)
  (\#eq:combination)
\end{equation} 

where $P_e(y>Y_N)$ is the annual exceedance probability for the combined wind hazards, $P_{NH}(y<Y_N)$ is the annual non-exceedance probability for non-hurricane winds, and $P_{H}(y<Y_N)$ is the annual non-exceedance probability for hurricane winds.

To understand Equation \@ref(eq:combination), it is important to remember that to calculate return level $Y_N$, for a given N-year return period, the exceedance probability $\frac{1}{N}$ of $Y_N$ is calculated. Then, the non-exceedance probability for $Y_N$ is $\left(1-\frac{1}{N}\right)$. The procedure consists in the creation of a new hazard curve, calculating all $P_e(y>Y_N)$ values for different $Y_N$ return levels, combining hazard curves from non-thunderstorm and thunderstorm data.

Equation \@ref(eq:combination) can be expressed only in terms of exceedance probabilities, $P_{e} = 1 - (1 -P_{nh}) (1 - P_{h})$, where $P_{nh}$ is the the annual exceedance probability for non-hurricane winds, and $P_{h}$ is the annual exceedance probability for hurricane winds. A graphical explanation of the procedure to calculate the combined $P_e$ for the return level $30\frac{km}{h}$, is shown in Figure \@ref(fig:combinedhc). For each cell in the study area, it is necessary to calculate a new combined hazard curve, this is, all the $P_e$ values corresponding all different return levels (Figure \@ref(fig:combinedhc)).


```{r combinedhc, echo=FALSE, fig.align="center", fig.cap="Integration of Hurricane and Non-Hurricane Data", message=FALSE, warning=FALSE, fig.height=2.5, fig.width=5}

plotit<-function(){

  par(mar=c(2,2,0,0))
  par(oma=c(0,0,0,0))
  par(bg=NA)
  plot(1, xlab='', ylab='', type='n', yaxt='n', xaxt='n', tck=0, xlim=c(0,200), ylim=c(0,0.05), bg = 'transparent', bty="n")
  arrows(0,0,0,0.05, length=0.04)
  arrows(0,0,200,0, length=0.04)
  text(x = par("usr")[2] - 5,  y = par("usr")[3] - 0.005, labels = expression(frac(1, N)), xpd = NA, srt = 0, cex = 0.7)
  text(x = par("usr")[1] - 6,  y = 0.05, labels = expression(Y[N]), xpd = NA, srt = 0, cex = 0.7)
  text(x = 50.2,  y = par("usr")[3] - 0.003, labels = "?", xpd = NA, srt = 0, cex = 0.7)
  text(x = 68.5,  y = par("usr")[3] - 0.003, labels = "0.02", xpd = NA, srt = 0, cex = 0.7)
  text(x = 100,  y = par("usr")[3] - 0.003, labels = "0.03", xpd = NA, srt = 0, cex = 0.7)
  text(x = par("usr")[1] - 10,  y = 0.015, labels = expression(paste("30 ", frac(Km, h))), xpd = NA, srt = 0, cex = 0.7)
  text(x = par("usr")[2] - 2,  y = 0.048, labels = "Combined", xpd = NA, srt = 0, pos = 2, cex = 0.6)
  text(x = par("usr")[2] - 2,  y = 0.031, labels = "Hurricanes", xpd = NA, pos = 2, srt = 0, cex = 0.6)
  text(x = par("usr")[2] - 2,  y = 0.021, labels = "Non-Hurricanes", xpd = NA, pos = 2, srt = 0, cex = 0.6)
  myexp = expression(paste(P[e], " = 1 - (1 -", P[nh], ") * (1 - ", P[h], ")"))
  text(x = par("usr")[1] + 60,  y = 0.049, labels = myexp, xpd = NA, srt = 0, cex = 0.7)
  text(x = par("usr")[1] + 60,  y = 0.045, labels = "? = 1- (1 – 0.03)(1-0.02)", xpd = NA, srt = 0, cex = 0.6)
  location = 65
  scale = 20
  .x <- seq(0, 1500, length.out=1000)
  hfG <- function(x) {
    (1/scale)*(exp(-(x-location)/scale))/(exp(exp(-(x-location)/scale))-1)
  }
  curve(hfG, add=T, col="red", lwd=1, lty=5)
  Arrows (x0=50.2, y0=0, x1=50.2, y1=(hfG(50.2)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
  
  location = 80
  scale = 30
  .x <- seq(0, 1500, length.out=1000)
  curve(hfG, add=T, col="red", lwd=1)
  Arrows (x0=68.5, y0=0, x1=68.5, y1=(hfG(68.5)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
  
  location = 100
  scale = 40
  .x <- seq(0, 1500, length.out=1000)
  curve(hfG, add=T, col="red", lwd=1)
  Arrows (x0=100, y0=0, x1=100, y1=(hfG(100)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
  Arrows (x0=100, y0=hfG(100), x1=7, y1=hfG(100) , arr.type="triangle", arr.width=0.04, lwd=0.1)
}

z.plot1<-function(){plotit()}
mydataframe = data.frame(v = c(10, 20, 30, "...", 350, "..."), Pe = c("...", "...", "?", "...", "...", "..."))
names(mydataframe) <- c(expression(Y[N]), expression(P[e]))
tt <- ttheme_default(base_size = 7, colhead=list(fg_params = list(parse=TRUE)))
tbl <- tableGrob(mydataframe, rows=NULL, theme=tt)
plot_grid(z.plot1, tbl, ncol = 2, rel_widths = c(4,1), labels=c("", "Combined Curve"), label_size = 7, hjust=-0.13)
```

The procedure followed in this research to generate the final maps, requires that hurricane and non-hurricane wind maps have already been generated in raster format for main return periods (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, and 7000 years). As many maps as possible are required for different return periods to estimate detailed enough hazard curves from return values (cell values).

The main elements of the implemented algorithm are: (a) select the cell size for the integrated map as the _maximum_ cell size of the input maps (hurricane and non-hurricane), (b) create an empty final raster with cell size from previous step, (c) recreate input maps using zonal statistics, where zone is the empty raster, to leave only the _maximum_ value of the input cells that fall within each cell in zone, (c) for each available tile in integrated map, recreate non-hurricane and hurricane hazard curves (using result from previous step), (d) use Equation \@ref(eq:combination) to calculate the integrated return level for each tile of the final map, and (e) create a final raster for each main return period.

An alternative approach for non-hurricane and hurricane data integration is apply Equation \@ref(eq:combination) with non-hurricane hazard curves obtained at each station (before spatial interpolation) in combination with hurricane hazard curves from existing studies (obtained at same location of non-hurricane stations), and finally apply a spatial interpolation process. Main disadvantage of this procedure is that no non-hurricanes wind maps will be generated. Main advantage is that the computation time and the complexity of the integration algorithm are comparatively low, since the integration is only done in the location of the stations and not in all cells.

For the selection of the method to apply in this section, it is recommended to analyze among other aspects: (a) differences in the spatial resolution (e.g. cell size) of both types of studies need to be evaluated, because it can impact the quality of the final product, (b) prediction errors resulting from spatial interpolation can be amplified at specific cells locations, (c) creation of the algorithm, i.e. use of `st_apply` function of `stars` package [@Pebesma2019b] (apply functions to raster dimensions) to avoid cell-by-cell calculations, and (d) computer processing time.