02-chap2.Rmd

---
output:
  html_document: default
  pdf_document: default
---
# Theoretical Framework {#rmd-thefra}

<!-- Required to number equations in HTML files -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>

```{r load_pkgs2, echo=FALSE, message=FALSE, warning=FALSE}
# List of packages required for this analysis
  pkg <- c("RcmdrMisc")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
  new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
  if (length(new.pkg))
    install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load packages (thesisdown will load all of the packages as well)
  library(RcmdrMisc)
  library(knitr)
```


## Probability Concepts

### Probability Density Function PDF

PDF defines the probability that a continuous variable falls between two points. In PDF the probability is related to the area below the curve (integral) between two points, as for continuous probability distributions the probability at a single point is zero. The term density is related to the quantity of probability defined below each part of the curve, the higher the values of the curve, the higher the density and, consequently, the probability.

\begin{equation}
        \int_a^b f(x)dx = Pr[a \leq X \leq b]
  (\#eq:pdf)
\end{equation}


Equation \@ref(eq:gumbelpdf) is the Gumbel PDF.

<!--
\begin{equation}
  \mathrm{
    f(x)=\frac{1}{\beta}
    e^{\left(-\frac{x-\mu}{\beta}\right)}
    e^{\left(
            -e^{\left(\frac{x-\mu}{\beta}\right)}
      \right)}
  }
  (\#eq:gumbelpdf)
\end{equation}
-->

\begin{equation}
          f(x)=\frac{1}{\beta}
          \exp\left\{
            -\frac{x-\mu}{\beta}
          \right\}
          \exp\left\{
            -\exp\left\{
              -\left(
                \frac{x-\mu}{\beta}
              \right)
            \right\}
          \right\},
          \quad -\infty < x < \infty
  (\#eq:gumbelpdf)
\end{equation}


where $\exp\left\{.\right\}$ is $\mathrm{e}^{\left\{.\right\}}$, $\beta$ the scale parameter, and $\mu$ the location parameter. Location ($\mu$) has the effect to shift the PDF to left or right along 'x' axis, thus, if location value is changed the effect is a movement to the left (small value for location), or to the right (big value for location). Scale has the effect to stretch ($\beta > 1$) of compress  ($0 < \beta< 1$) the PDF, if scale parameter is close to zero it approaches a spike.


Figure \@ref(fig:plotgumbelpdffunction) shows PDF with location ($\mu$) = 100 and scale ($\beta$) = 40, using Equation \@ref(eq:gumbelpdf). Figure \@ref(fig:plotgumbelpdf) shows PDF with location ($\mu$) = 100 and scale ($\beta$) = 40, using function `dgumbel` of the package `RcmdrMisc`.

```{r plotgumbelpdffunction, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Gumbel PDF", size="footnotesize", fig.width= 4, fig.height= 3}
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
pdfG <- function(x) {
  1/location *exp(-(x-location)/scale)*exp(-exp(-(x-location)/scale))
  }
.y = pdfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Density Function - Gumbel Distribution", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7,
 main=paste("Gumbel - Density Function Gumbel Distribution\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", cex.sub=0.6)
```


```{r plotgumbelpdf, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Gumbel PDF - dgumbel function", size="footnotesize", fig.width= 4, fig.height= 3}
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
dfG = dgumbel(.x, location=location, scale=scale)
plot(.x, dfG, col="red", lty=4, xlab="Velocities Km/h", ylab="Density Function - Gumbel Distribution", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7,
 main=paste("Gumbel - Density Function Gumbel Distribution\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", cex.sub=0.6)
```

### Cumulative Distribution Function CDF

CDF is the probability of taking a value less than or equal to x. That is

\begin{equation}
        F(x) = Pr[X < x] = \alpha
  (\#eq:cdf1)
\end{equation}

For a continuous variable, CDF can be expressed as the integral of its PDF.

\begin{equation}
        F(x) = \int_{-\infty}^x f(x)dx
  (\#eq:cdf2)
\end{equation}


Equation \@ref(eq:gumbelcdf) is the Gumbel CDF.

\begin{equation}
        F(x) = \exp\left\{-\exp\left[-\left(\frac{x-\mu}{\beta}\right)\right]\right\}, 
        \quad -\infty < x < \infty
  (\#eq:gumbelcdf)
\end{equation}


Figure \@ref(fig:plotgumbelcdffunction) shows Gumbel CDF with location ($\mu$) = 100 and scale ($\beta$) = 40, using Equation \@ref(eq:gumbelcdf). As previously done with PDF, similar result can be achieved using function `pgumbel` of package `RcmdrMisc`.

```{r plotgumbelcdffunction, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Gumbel CDF", size="footnotesize", fig.width= 4, fig.height= 3}
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
cdfG <- function(x) {
  exp(-exp(-(x-location)/scale))
  }
.y = cdfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Probability", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, type="l",
 main=paste("Gumbel - Cumulative Distribution Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), cex.sub=0.6)
```

### Percent Point Function PPF

PPF is the inverse of CDF, also called the _quantile_ function. This is, from a specific probability get the corresponding value x of the variable.

\begin{equation}
        x = G(\alpha) = G(F(x))
  (\#eq:ppf)
\end{equation}


Equation \@ref(eq:gumbelppf) is the Gumbel PPF.

\begin{equation}
        G(\alpha) = \mu-\beta ln(-ln(\alpha))
        \quad 0 < \alpha < 1
  (\#eq:gumbelppf)
\end{equation}


Figure \@ref(fig:plotgumbelppffunction) shows Gumbel PPF, using Equation \@ref(eq:gumbelppf). Similar result can be achieved using function `qgumbel` of package `RcmdrMisc`.

```{r plotgumbelppffunction, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Gumbel PPF", size="footnotesize", fig.width= 4, fig.height= 3}
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 1, length.out=1000)
ppfG <- function(x) {
  location - (scale*log(-log(x)))
  }
.y = ppfG(.x)
plot(.x, .y, col="green", lty=4, ylab="Velocities Km/h", xlab="Probability", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, cex.sub=0.6,
 main=paste("Gumbel - Percent Point Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l")
```


### Hazard Function HF {#hf}

Figure \@ref(fig:plotgumbelhffunction) shows Gumbel HF, using Equation \@ref(eq:gumbelhf).


```{r plotgumbelhffunction, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Gumbel HF", size="footnotesize", fig.width= 4, fig.height= 3}
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 1500, length.out=1000)
hfG <- function(x) {
  (1/scale)*(exp(-(x-location)/scale))/(exp(exp(-(x-location)/scale))-1)
  }
.y = hfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Hazard", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, cex.sub=0.6,
 main=paste("Gumbel - Hazard Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", xlim=c(0,500))
```

HF is the ratio between PDF and SF. SF is the survival function $S(x) = 1 - F(x)$, which defines the probability that a variable takes a value greater than x, $S(x) = Pr[X > x] = 1 - F(x)$.

\begin{equation}
        h(x) = \frac{f(x)}{S(x)} = \frac{f(x)}{1-F(x)}
  (\#eq:hf)
\end{equation}


Equation \@ref(eq:gumbelhf) is the Gumbel HF.


\begin{equation}
        h(x)= \frac{1}{\beta}\frac{\exp(-(x-\mu)/\beta)}{\exp(\exp(-(x-\mu)/\beta))-1}
  (\#eq:gumbelhf)
\end{equation}


## Statistical Concepts for Extreme Analysis

In order to approach the extreme value analysis, some statistical concepts are needed to understand the theoretical framework behind this knowledge area. This section introduces the concepts annual exceedance probability, mean recurrence interval MRI, exposure time, and compound probability for any given exposure time and MRI. As a hypothetical example, a simulated database of extreme wind speed will be used. This database is supposed to have 10.000 years of wind speeds.

### Annual Exceedance Probability $P_e$

Using the previously described database, a question arises to calculate the probability to exceed the highest probable damage caused to any structure by the action of the winds from this simulated database. It is possible to conclude that there is only one event greater or equal (in this case equal) to the highest probable causing damage in 10.000 years, and it is the _highest wind_. If the database is sorted by wind magnitude in descending order, i.e. small winds to the right of the list as shown in Figure \@ref(fig:winddatabase), the question is solved calculating the annual exceedance probability _Pe_ with Equation \@ref(eq:pedatabase).


```{r winddatabase, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Sorted Wind Velocities by Magnitude", size="footnotesize", fig.width= 3, fig.height= 1}
par(oma=c(0,0,0,0))
par(mar=c(1.5,0,0,0))
par(xpd=NA)
#par(pin = c(3, 1))
plot(1, type="n", xlab="Event Index", ylab="Wind Speed", 
     xlim=c(0,200), ylim= c(0,50), xaxt ="n", yaxt="n", bty="n", asp = 0.8)
x1 = seq(from=10, to=100, by=10)
y1 = c(45, 40, 30, 25, 20, 19, 15, 14, 13, 12)
text(x=x1,  par("usr")[3], 
     labels = x1/10, srt = 0, pos = 1, xpd = TRUE, cex=0.8)
for (i in 1:10) lines(x=c(x1[i],x1[i]), y=c(0,y1[i]))
lines(x=c(125,125), y=c(0,9))
text(x=125,  par("usr")[3], 
     labels = "..." , srt = 0, pos = 1, xpd = TRUE)
x1 = seq(from=150, to=200, by=10)
y1 = c(8, 7, 6, 5, 4, 3)
for (i in 1:10) lines(x=c(x1[i],x1[i]), y=c(0,y1[i]))
text(x=x1,  par("usr")[3], 
     labels = "." , srt = 0, pos = 1, xpd = TRUE, cex=0.8)
axis(1, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0)
#box(which = "outer", col="black", lwd=0.1)
```

The annual exceedance probability $P_e$ equals to the ratio between _event index after descending sorting_ and _years of simulations_. The highest wind will be the first in the sorted list.

\begin{equation}
        P_e = \frac{\textrm{Event index after descending sorting}}{\textrm{Years of simulations}}= \frac{1}{10.000}=0.001=0.01\%
  (\#eq:pedatabase)
\end{equation}

Same exercise can be done with all winds to construct the annual exceedance probability curve, that in this case will represent the probability to equal or exceed different probable damage due to wind.

### Mean Recurrence Interval MRI

Continuing with the previous section, if the inverse of the exceedance probability is taken, the return period (in years) is obtained. The return period or Mean Recurrence Interval MRI is associated with a specific return level (wind extreme velocity). MRI is the numbers of years (N) needed to obtain 63% of change that the corresponding return level will occur at least one time in that period. The return level is expected to be exceeded on average once every N-years. 

The formula to calculate the annual exceedance probability of the return level depends on the MRI value as shown below:

\begin{equation}
        P_e = 
        \begin{cases}
          \begin{aligned}
                &1-\exp\left(-\frac{1}{MRI}\right),\;\textrm{for MRI < 10 years}
                \\
                &\frac{1}{MRI},\;\textrm{for MRI }\geq\textrm{ 10 years}      
          \end{aligned}
        \end{cases}
  (\#eq:pe)
\end{equation}

For a specific wind extreme event A, the probability that the event will occur in a period equal to MRI years is 63%. If we analyze for the same period a strongest wind extreme event B, its occurrence probability will be less than 63%. If the purpose of this research is to design infrastructure considering wind loads, the structure will be more resistant to wind if we design with stronger winds, this is high MRIs, and low annual exceedance probability. Common approach for infrastructure design, considering any type of load (earthquake, wind, etc.) is to choose high MRI according to the importance/use/risk/type of the structure. For highly important structures, like hospitals or coliseums, where the risk of collapse must be diminished, the MRI used to design is higher in comparison to common structures (for instance a normal house), which implies less risks for its use and importance.


### Compound Exceedance Probability Pn

```{r compoundprobability, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Compound Probability", size="footnotesize", fig.width= 5, fig.height= 2.5}
par(mar=c(3,3,0,0))
par(oma=c(0,0,0,0))
par(mgp=c(2,1,0))
plot(1, type="n", xlab=expression(paste("Compound Probability ", P[n])), ylab="Exposure Time as a Multiple of MRI", 
 xlim=c(0,1), ylim= c(0,2500), xaxt ="n", yaxt="n", bty="n", cex.lab=0.7)

y1 = c(0, 500,1000,1500,2000,2500)
text(y=y1,  x=par("usr")[1], labels = y1/500, srt = 0, pos = 2, xpd = TRUE, cex=0.8)
y1 = 500*0.69
text(y=y1,  x=par("usr")[1], labels = ".69", srt = 0, pos = 2, xpd = TRUE, cex=0.6)
y1 = 2250
text(y=y1,  x=par("usr")[1], labels = expression(paste("4",frac(1,2))), srt = 0, pos = 2, xpd = TRUE, cex=0.6)

npn <- function(x) (1-(1/500))^x #Event will not occur
n = seq(from=0, to=2500, by=1)
mynpn = npn(n)

lines(x=mynpn,y=n, col= "blue")

pn <- function(x) 1-(1-(1/500))^x #Event will occur
mypn= pn(n)
lines(x=mypn,y=n, col= "green")

text(x=c(0.01, 0.37,0.63, 0.99),  par("usr")[3], labels = c(".01",".37",".63",".99") , srt = 0, pos = 1, xpd = TRUE, cex=0.6)

axis(1, at= seq(from=0, to=1, by= 0.1), labels=seq(from=0, to=1, by= 0.1), tick=TRUE, col.axis="black", cex=0.8)

axis(2, at=c(0, 500,1000,1500,2000,2500),labels=FALSE, tick=TRUE, col.axis="black")
axis(2, at=c(345, 2250),labels=FALSE, tick=TRUE, col.axis="black", tck=-0.015)
axis(1, at=c(0.01,.37,.63,0.99),labels=FALSE, tick=TRUE, col.axis="black", tck=-0.015)

abline(v=c(0.01, 0.37,0.5, 0.63,0.99), lty="dotted")
abline(h=c(345,500, 2250), lty="dotted")
text(x=0.15,  y=1800, labels = "chance event\nwill not occur", cex=0.7)
text(x=0.85,  y=1800, labels = "chance event\nwill occur", cex=0.7)
```

It is possible to calculate a compound probability $P_n$, where $n$ is the exposure period. The exposure period is the usage time of the structures that have been designed with an extreme wind speed. $P_n$ is the probability that the extreme wind speed will be equaled or exceeded at least one time in $n$ years, and in this sense it is a probability of occurrence:

\begin{equation}
        P_n = 1-\left(1-\frac{1}{MRI}\right)^n,\;\textrm{occurrence probability}
  (\#eq:pn)
\end{equation}


As a complementary probability, it is possible to calculate the compound non-occurrence probability as $\left(1-\frac{1}{MRI}\right)^n$

If it is considering exposure time as a multiple of return period, the resulting Figure  \@ref(fig:compoundprobability), shows that:

* When exposure time is 69% of the return period, then probability (occurrence and non-occurrence) will be 50%
* As was stated previously, when exposure time is equal to return period, then the probability that the extreme wind speed (return level) occur is 63%, and 37% for the non-occurrence probability.
* If exposure time is 4.5 times the return period, there is a 99% of change that the return level will occur.

The example discussed here was presented as an instrument to introduce important concepts, nonetheless, there are specialized approaches to deal with extreme value analysis which will be discussed in _[Extreme Value Analysis Overview](#extremeoverview)_.


## Extreme Value Analysis Overview {#extremeoverview}

Analysis of extreme values is related with statistical inference to calculate probabilities of extreme events. Main methods to analyze extreme data are _sample maxima_, and _Peaks Over Threshold POT_. The sample maxima method also known as epochal or block maxima, is the classical approach and uses the most extreme value for a specific frame of time, typically one year. POT is based on the selection of a single threshold value to do the analysis only with values above the threshold. There are different POT approaches depending on how the time and magnitude dimensions are analyzed, the most common one uses generalized Pareto for wind velocities and Poisson process for time (POT-Poisson-GPD), and the most flexible one uses Poisson process simultaneously for both dimensions (POT-PP).

In both methods (Epochal and POT), the main step is to fit wind velocities to an appropriate probability distribution model. Epochal uses a generalized extreme value distribution GEV, a family of distributions which include extreme value type I  - Gumbel, extreme value type II - Fréchet, and Weibull. Gumbel is the most common used GEV. POT uses a generalized Pareto distribution (POT-Poisson-GPD), or an intensity function (POT-PP).

Distribution models are fitted based on the estimation of its parameters, commonly called location, scale, and shape, nonetheless each model has its own parameters names. There are different methods to estimate parameters, among them: (a) method of modified moments [@Kubler1994], and L moments [@Hosking1997], (b) method of maximum likelihood MLE [@Harris1994], (c) probability plot correlation coefficient, and (d) elemental percentiles (for GPD and GEV).

Once candidate parameters are available, it is necessary to assess the goodness of fit of the selected model using tests like Kolmogorov-Smirnov (KS), or Anderson-Darling. Here a visual assessment is also useful using a probability plot or a kernel density plot with the fitted PDF overlaid.

The main use of the fitted model is the estimation of mean return intervals MRI, and extreme wind speeds (return levels),

\begin{equation}
        MRI=\frac{1}{1-F(y)}
  (\#eq:mri)
\end{equation}

with $F(y)$ as the CDF. If $1-F(y)$ is the annual exceedance probability, MRI is its inverse; see [@Simiu1996] for more details about MRI. If $y$ is solved from Equation \@ref(eq:mri) using a given MRI of N-years, its value represents the $Y_N$ wind speed return level. Refer to each specific method bellow for specific solutions to $Y_N$.

The CRAN Task View "Extreme Value Analysis" <https://cran.r-project.org/web/views/ExtremeValue.html> shows available **R** for block maxima, POT by GPD, and external indexes estimation approaches. Most important to consider are `evd`, `extremes`, `evir`, `POT`, `extremeStat`, `ismev`, and `Renext`.

### Epochal (Sample Maxima)

To work with random variables of sample maximum values, used probability distribution function PDF is GEV:

\begin{equation}
        H(y) = \exp\left\{-\left[1+\xi\frac{x-\mu}{\psi}\right]_+^{-\frac{1}{\xi}}\right\}
  (\#eq:epochalpdf)
\end{equation}


where $\xi\neq0$, $[...]_+ = max([...],0)$, $\mu$ is the location parameter, $\psi > 0$ is a scale parameter, and $\xi$ is a shape parameter. GEV combines in one unique family the Gumbel (medium-tailed) distribution (limit $\xi\rightarrow0$), Fréchet (long-tailed) distribution ($\xi>0$), and Weibull (short-tailed) distribution ($\xi<0$).

### Peaks Over Threshold using GPD and 1D Poisson Process POT-Poisson-GPD {#pot-poisson-gpd}

In this model, (a) the magnitude of the observations above the threshold are assumed to be independent random variables with the same generalized Pareto as probability distribution, $\sigma$ as scale, and $\xi$ as tail length, and (b) corresponding times are assumed to follow a one dimensional homogeneous Poisson process with $\gamma$ as parameter. 

With the condition of exceeding some high threshold $b$, as a consequence $y = (x-b) > 0$, the CDF of $y$ is the generalized Pareto distribution GPD:

\begin{equation}
        F(y) = 1 - \left[1-\xi\frac{y}{\sigma}\right]^{-\frac{1}{\xi}}_+,
  (\#eq:gpd)
\end{equation}


where $[...]_+ = max([...],0)$, $b$ is the threshold. In both GPD (magnitude), and $1D$ Poisson process (time), it is not possible to differentiate between thunderstorm and non-thunderstorm wind types. @Pickands1971 found a rigorous connection between epochal (GEV) and limits results of POT-Poisson-GPD, as parameter shape $\xi$ in Equations \@ref(eq:epochalpdf) and \@ref(eq:gpd) are exactly the same. The long-tailed case when $\xi>0$, GPD behaves as usual Pareto distribution, for $\xi = 0$ (taking the limit $\xi\rightarrow0$) it behaves as exponential distribution, and $\xi<0$ the distribution has a finite upper endpoint at $-\frac{\sigma}{\xi}$.

The equation to calculate return levels (RL) $Y_N$ corresponding to the N-years return period in POT-Poisson-GPD is:

\begin{equation}
        Y_N =G\left(y, 1-\frac{1}{\lambda\,N}\right)
  (\#eq:yngpd)
\end{equation}

where $G$ is the quantile function (PPF), and the value of the probability passed to the $G$ function, has to be modified with the $\lambda$ parameter. $\lambda$ is the number of wind speed events over the threshold per year.

 
### Peaks Over Threshold Using a 2D Poisson Process POT-PP {#method-pot-pp}

According to @Pintar2015, the stochastic Poisson Process PP is mainly defined by its intensity function. As the intensity function is not uniform over the domain, the PP considered here is non-homogeneous, and due to the intensity function dependency of magnitude and time, it is also bi-dimensional. PP was described for the first time in @Pickands1971, then extended in @Smith1989. 

Generic Equation \@ref(eq:ppgenericintensityfunction) shows the intensity function, which is defined in the domain $D = D_t\,{\cup}\,D_{nt}$, and allow to fit the PP at each station to the observed data $\{t_i, y_i\}_{i=1}^I$, for all the times ($t_i$) of threshold crossing observations, its corresponding wind speeds magnitudes ($y_i$), and with $I$ as the total number of observations. Thus, only data above the threshold (POT) are used.

\begin{equation}
    \lambda\left(y,t\right)
    \begin{cases}
      \begin{aligned}
            &\lambda_t(y),\;\textrm{for t in thunderstorm period}
            \\
            &\lambda_{nt}(y),\;\textrm{for t in non-thunderstorm period}      
      \end{aligned}
    \end{cases}
  (\#eq:ppgenericintensityfunction)
\end{equation}

The specific intensity function of the PP is defined in [@Smith2004] and is shown in Equation \@ref(eq:ppintensityfunction):

\begin{equation}
    \frac{1}{\psi_t}\left(1+\zeta_t\frac{y-\omega_t}{\psi_t}\right)_+^{-\frac{1}{\zeta_t}-1}
  (\#eq:ppintensityfunction)
\end{equation}  
        
where, at a given time $t$ parameter $shape = \zeta_t$ controls the tail length of the intensity function, and the other two parameters $\omega_t$ and $\psi_t$ define the location and scale of the intensity function.

Figure \@ref(fig:plotdomainpp) represent the domain $D$ of PP. In time, the domain represents the station service period from first sample $t_1$ to last sample $t_4$. $D$ is the union of all thunderstorm periods $D_t$ (from $t_2$ to $t_3$), and all non-thunderstorm periods $D_{nt}$ (periods $t_1$ to $t_2$ and $t_3$ to $t_4$). In magnitude, only thunderstorm data above its threshold $b_t$, and only non-thunderstorm data above its threshold $b_{nt}$ are used. 

Thunderstorms and non-thunderstorms are modeled independently: 

1. Observations in domain $D$ follow a Poisson distribution with mean $\int_D\lambda(t,y)\,dt\,dy$
2. For each disjoint sub-domain $D_1$ or $D_2$ inside $D$, the observations in $D_1$ or $D_2$ are independent random variables.


```{r plotdomainpp, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Domain off Poisson Process - PP", fig.height= 2, fig.width=3.5}
par(mar=c(3,3,0,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
plot(1, type="n", xlab="Time", ylab="Wind Speed", 
     xlim=c(0,30), ylim= c(0,30), xaxt ="n", yaxt="n", bty="n", cex.lab=0.8, cex.axis=0.8)
xpos=c(10,15,20,30)
xtick<-c(c(expression(t[1]), expression(t[2]), expression(t[3]), expression(t[4])))
text(x=xpos,  par("usr")[3], 
     labels = xtick, srt = 0, pos = 1, xpd = TRUE, cex=0.8)

ypos=c(3,6,10,30)
ytick<-c(expression(b[nt]), expression(b[t]), expression(y[1]), expression(y[2]))
text(y=ypos,  x=par("usr")[1], 
     labels = ytick, srt = 0, pos = 2, xpd = TRUE, cex=0.8)

rect(10,10,30,30, density = 5, col= NA, border="black")


lines(x=c(10,10), y=c(0,10), lty="dashed", col = "gray")
lines(x=c(30,30), y=c(0,10), lty="dashed", col = "gray")
lines(x=c(0,10), y=c(10,10), lty="dashed", col = "gray")
lines(x=c(0,10), y=c(30,30), lty="dashed", col = "gray")
lines(x=c(15,15), y=c(10,30), col = "black")
lines(x=c(15,15), y=c(0,10),  lty="dashed", col = "gray")

lines(x=c(20,20), y=c(10,30),  col = "black")
lines(x=c(20,20), y=c(0,10), lty="dashed", col = "gray")

lines(x=c(10,15), y=c(3,3), col = "black")
lines(x=c(20,30), y=c(3,3), col = "black")
lines(x=c(0,10), y=c(3,3), lty="dashed", col = "gray")
lines(x=c(15,20), y=c(3,3), lty="dashed", col = "gray")

lines(x=c(15,20), y=c(6,6), col = "black")
lines(x=c(20,30), y=c(6,6),  lty="dashed", col = "gray")
lines(x=c(0,15), y=c(6,6), lty="dashed", col = "gray")

rect(15,10,20,30, density = 5, angle=-45, col= NA, border="black")
#x <- "\xD0"
#Encoding(x) <- "latin1"
text(28, 28, "D", cex=1.2)
axis(1, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0)
axis(2, labels=FALSE, tick=TRUE, col.axis="black", lwd.ticks=0)
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```


Visual representation of the intensity function for PP can be seen in Figure \@ref(fig:plotdomain3dpp). In vertical axis, two surfaces were drawn representing independent intensity functions for thunderstorm $\lambda_t(y)$ and for non-thunderstorm $\lambda_{nt}(y)$. The volume under each surface for its corresponding time periods and peak over threshold velocities, is the mean of PP.


```{r plotdomain3dpp, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Volume Under Surfaces: Mean of PP", dpi=200, fig.align='center'}
#, dpi=300, size="footnotesize", fig.width= 5, fig.height= 5}
include_graphics(path = "figure/domain3d.PNG")
```

To fit the intensity function to the data the method of maximum likelihood is used to estimate its parameters. Equation \@ref(eq:pplikelihood) is not the joint probability density function of the $I$ observations, but that can be used as likelihood because being proportional they have exactlty the same inflection points. All possible vectors of parameters are represented by $\eta$, but those selected as the Poisson process parameters ($scale = \psi$, $location = \omega$, and $shape = \zeta$) are the $\hat\eta = (\hat\psi, \hat\omega, \hat\zeta)$ values that maximizes $L(\eta)$.


\begin{equation}
          L(\eta)=\left( 
                        \prod_{i=1}^I\lambda\left(y_i,t_i\right)  
                  \right)
                  \exp\left\{
                            -\int_{{D}}\lambda\left(y,t\right)dy\,dt
                     \right\}.
  (\#eq:pplikelihood)
\end{equation}

The values of $\hat\eta$ need to be calculated using a numerical approach, because there is not analytical solution available.

Once the PP is fitted to the data, the model will provide extreme wind velocities (return levels), for different return periods (mean recurrence intervals). 

A $Y_N$ extreme wind velocity, called the return level (RL) belonging to the N-years return period, has an expected frequency to occur or to be exceeded (annual exceedance probability) $P_e = \frac{1}{N}$, and also has a probability that the event does not occur (annual non-exceedance probability) $P_{ne}=1-\frac{1}{N}$. $Y_N$ will be the resulting value of the $G$ (PPF or quantile) function using a probability equal to $P_{ne}$. $Y_N=quantile(y, p=P_{ne})=G(y,p=P_{ne})=PPF(y,p=P_{ne})$. $Y_N$ can be understood as the wind extreme value expected to be exceeded on average once every N years.
 

For PP $Y_N$ is defined in terms of the intensity function as the solution to the equation

\begin{equation}
          \int_{Y_N}^{\infty}\int_0^1\lambda\left( y,t\right)dydt = A_t\int_{Y_N}^{\infty}\lambda_t\left( y\right)\,dy + A_{nt}\int_{Y_N}^{\infty}\lambda_{nt}\left( y\right)\,dy = \frac{1}{N}
  (\#eq:pprl)
\end{equation}

where $A_t$, is the multiplication of the average number of thunderstorm per year and the average length of a thunderstorm, taken to be 1 hour as defined in [@Pintar2015], and $A_{nt} = 365 - A_t$. The average length of a non-thunderstorm event is variable, and it is adjusted for each station to guarantee that $A_{nt} + A_t = 365$. Value 365 is used only, if operations with time in the dataset are performed in days.

The same thunderstorm event in considered to occur if the time lag distance between successive thunderstorm samples is small than six hours, and for non-thunderstorm this time is 4 days. For PP, all the measurements belonging to the same event (thunderstorm or non-thunderstorm), need to be de-clustered to leave only one maximum value. In other words, the number of thunderstorm in the time series is one plus the number of time lag distances greater than 6 hours, and above 4 days for non-thunderstorm.

## Wind Loads Requirements {#windloadsrequirements}

As the output maps of this research will be used as input loads for infrastructure design, the methodology used for its creation, need to be consistent with Colombian official wind loads requirements. Colombian structure design code, from now the _design standard_, was created and is maintained by the Colombian Association of Seismic Engineering - AIS. 

The design standard is mainly based on _minimum design loads and associated criteria for buildings and other structures ASCE7-16_ standard [@Asce2017]. The ASCE7-16 standard defines the minimum requirements for design wind loads in pages 733 to 747. Wind speeds requirements of ASCE7-16 are based on the combination of independent non-hurricane analysis, and hurricane wind speeds simulations models. The focus of this research will be the analysis of non-hurricane wind data, however, existing results of hurricane studies will be used to present final maps with both components. In ASCE7-16, for non-hurricane wind speed, the procedure is mainly based on [@Pintar2015].

ASCE7-16 (page 734), requires the calculation of wind extreme return levels for specific return periods according to the risk category of the structure to be designed, as follows: risk category I - 300 years,  risk category II - 700 years,  risk category III - 1700 years,  risk category IV - 3000 years. In addition, extreme wind speeds for those MRI need to correspond to 3 seconds gust speeds at 33 ft. (10 meters) above the ground and exposure category C. Below is a description of the risk categories:

* Risk IV - This are 'indispensable buildings' that involve substantial risk. These structures that can handle toxic or explosive substances.
* Risk III - There is substantial risk because these structures that can handle toxic or explosive substances, can cause a serious economic impact, or massive interruption of activities if they fail.
* Risk II - Category 'by default', and correspond to structures not classified in others categories.
* Risk I - This structures represent low risk for life of people.

To standardize wind speeds to gust speeds ASCE7-16 proposes the curve Durst [@Durst1960], and Figure \@ref(fig:durstcurve). Durst curve is only valid for open terrain conditions, and it shows in axis $y$ the gust factor $\frac{V_t}{V_{3600}}$, a ratio between any wind gust (maximum speeds) averaged at $t$ seconds, $V_t$, and the hourly averaged wind speed $V_{3600}$, and in the axis $x$ the duration $t$ of the gust in seconds. Be aware that curve values in Figure \@ref(fig:durstcurve) are approximated values taken visually from the original curve. 


```{r durstcurve, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Durst Curve", fig.height=2.5, fig.width=5}
#include_graphics(path = "figure/durst.PNG")
par(mar=c(3,4,0,1))
par(oma=c(0,0,0,0))
par(mgp=c(2,1,0))

#x = c(3800, 1000,  800,   300, 200,   100,  35,  15,  10,   6,      4,    1)
#y = c(1,    1.04, 1.05, 1.10, 1.135, 1.19, 1.3, 1.4, 1.44, 1.475, 1.505, 1.56)
x = c(3800, 2000, 1000, 900, 800, 400, 200, 160, 100, 90, 60, 24, 15, 10, 8, 6, 4.5, 3, 2, 1.7, 1)
y = c(1, 1.02, 1.043, 1.048, 1.055, 1.085, 1.130, 1.15, 1.190, 1.2, 1.245, 1.35, 1.4, 1.435, 1.45, 1.475, 1.495, 1.52, 1.54, 1.55, 1.57)

plot(x=x, y=y, log="x", xlim=c(1,10000), type="n", ylim=c(1,1.7), tck=1, tcl= -0.5, xaxp=c(1,10000,1), xlab="Gust Duration (Seconds)", ylab="", cex.axis=0.7, cex.lab=0.8)
#par(las=1) #Horizontal
corners=par("usr")
text(x=0.2, y=mean(corners[3:4]),  labels=expression(frac(V[t], V[3600])), xpd=TRUE, srt=0, cex=0.8)

#mtext(text = expression(frac(V[t], V[3600])), side=2, line = 2, cex=0.8)

lh <- seq(1, 1.7, by=0.05)
abline(h=lh,col='lightgray', lwd=0.5)
lineasverticaleslogaritmicas <- function(posiciones)
{
    contador <- 0
    xactual <- 0
    posiciones <- posiciones
    for(.x in axTicks(1)) {
        xanterior <- xactual
        xactual <- .x
        if (contador > 0)
        {
         paso <- (xactual-xanterior)/9
         a1 <- seq(xanterior, xactual, by=paso)
         abline(v=a1[posiciones], col='lightgray', lwd=0.5)
         #rug (a1, col='red', ticksize=0.03)
         #text(a1, par('usr')[3]-0.01, srt=90, adj=1, labels=a1, xpd=TRUE, col='gray50', cex=0.6)
        }
        contador <- contador + 1
    }
}
lineasverticaleslogaritmicas(1:10)

lh <- seq(1.1, 1.7, by=0.1)
abline(h=lh,col='black', lwd=0.8)
lv <- c(1,10,100, 1000, 10000)
abline(v=lv,col='black', lwd=0.8)

smoothingSpline = smooth.spline(x, y, spar=0.35)
lines(smoothingSpline, col="red", lwd=2)

legend('topright', c('Synoptic (Durst, 1960)'), lty=c('solid'), lwd=c('2'), col=c('red'), inset = .04, cex=0.7, ncol=1, merge=F, box.col = "black",bg = "white")
#box(which = "plot", col="red")
#box(which = "figure", col="blue")
#box(which = "inner", col="cyan")
#box(which = "outer", col="orange")
```