pr_00_intro.qmd

---
output: html_document
title: "The soil site"
editor_options: 
  chunk_output_type: console
---

```{r}
library(dplyr)
library(tibble)
library(gt)
library(gtExtras)
```

## Defining the site {#sec-pr-site}

Soil profiles are described at sites. The 'site' concept is necessarily somewhat flexible to account for the range of sampling purposes and methods. Generally speaking, it is considered to include the immediate profile surrounds, up to a radius of \~10 m. As such, it will generally nest within a landform unless the site selection method was prescriptive and landform-blind.

Larger plots and long transects that definitely cover multiple landforms, or even landscapes, should be considered as being built from multiple related sites.

The soil profile description starts with an observation of the site's surface and works down. The chief tasks are assessing the surface, delineating soil layers, or horizons, and describing their morphological character. Prior to that, some additional reference data should be recorded:

## Reference data {#sec-ref}

Reference data is metadata that clearly identifies a soil profile description. This enables linking the description to other profiles, samples, analytical results and photographs. Reference data includes:

-   an identifier
-   who completed the description
-   when and where it was located
-   why the description was needed
-   how the location was chosen

These are discussed in turn below.

### Profile description identifiers {#sec-ref-id}

Assign each profile description a job or project identifier followed by a sequential number, e.g. '[ABC_001]{.ceg}'.

::: {#nte-uuid .callout-note collapse="true" appearance="minimal" title="Unique identifiers"}
There are many ways to build profile identifiers. The recommendation above is a compromise between robustness and simplicity that can at least guarantee unique identifiers within an active working group, e.g. a small business.

Well-designed databases use truly unique identifiers for every piece of data they hold. Under contemporary best practice, these are usually 'UUIDs', 128-bit hexadecimal codes (see @davis2024). While technically robust, these are difficult to work with day-to-day, and most workers will find themselves preferring to interact with simpler identifiers. In a soil profile database, these should be implemented as UUID aliases. End users should primarily interact with the aliases, and the UUIDs should remain hidden.
:::

::: {#nte-teamsites .callout-note collapse="true" appearance="minimal" title="Working in teams"}
Semi-unique identifiers require active management when working with multiple field teams operating in parallel. This avoids the risk of two sites being given the same ID on the same day. Each team should be assigned a 'block' of identifiers to work through, large enough to cover the intended number of profile descriptions.
:::

::: {#nte-sitetags .callout-note collapse="true" appearance="minimal" title="Other profile groupings"}
It may be desirable to be able to group profile descriptions in more complex ways. Examples of these might include

-   descriptions belonging to a transect or plot
-   descriptions representing a chrono-, climo- or topo-sequence

A tagging system can be useful in these cases. Otherwise, consider whether additional custom metadata fields should be implemented.
:::

### Authors {#sec-ref-auth}

Each profile description must have a single responsible author. Record their name. Additional participants may also be noted.

Use a short unique identifier for each officer, for example [SMITJ]{.ceg} for John Smith. These may be linked to a more complex user management system.

::: {#cau-priv .callout-caution collapse="true" appearance="minimal" title="Data privacy"}
New Zealand data privacy laws are strict. Personally identifying information must be stored securely and not distributed without a good reason. Personal details such as names may not be publishable without explicit consent. Data storage solutions must respect these requirements.
:::

### Location {#sec-ref-loc}

Record location coordinates as eastings and northings in New Zealand Transverse Mercator, e.g [1586090E, 5405720N]{.ceg}.

This is the bare minimum requirement for site location; see @sec-loc for a more detailed discussion.

### Date {#sec-ref-dt}

Record the date of observation using the ISO 8601 standard, [YYYY-MM-DD]{.ceg} [@internationalorganizationforstandardization]. Time is not essential but may be useful for correlating site descriptions with photographs.

### Site purpose {#sec-ref-why}

The original task for which a site was observed can affect what data is recorded and is relevant to data re-use decisions.

Record the reason for sampling using the schema in @tbl-ss-reason. This parameter does not need to be recorded in the field.

```{r}
#| label: tbl-ss-reason
#| tbl-cap: "Site purpose"

dat_ss_reason <-
  tribble(~Code, ~Name,
          'E', 'Mapping exploration',
          'V', 'Mapping validation',
          'R', 'Research',
          'M', 'Monitoring',
          'T', 'Reference',
          'D', 'Development')

tbl_ss_reason <- gt(dat_ss_reason) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red', 
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10))

tbl_ss_reason
```

::: {.content-visible when-format="docx"}
delete me
:::

::: {#nte-ss-reason .callout-note collapse="true" title="About site purpose types" appearance="minimal"}
[D]{.ceg} sites are often carried out as part of planning for a resource consent. The soil they describe may be removed or sealed after observation, limiting data re-use potential. [M]{.ceg} monitoring sites imply repeated, regular re-sampling activity and perhaps an expectation of minimal major disturbance between visits.

[V]{.ceg} validation sites may be low-detail, as they are intended to confirm or deny a prediction. [T]{.ceg} reference sites (T for ‘type’) may be expected to have high morphological detail and a lot of accompanying analytical data, as they represent ‘ideal’ profiles for particular soil classes.

[E]{.ceg} mapping exploration sites may have biased, opportunistic sample locations and moderate location precision, while [R]{.ceg} sites are more likely to have been predetermined by a model and located with high precision.
:::

### Site selection method {#sec-ref-sampstrat}

Spatial sampling strategies vary in their statistical properties. Soil data is stored for long periods and is often re-used for multiple projects; specific re-uses may not be appropriate.

Record the spatial sampling strategy used using the schema in @tbl-ss-sampstrat. This parameter does not usually need to be recorded in the field.

```{r}
#| label: tbl-ss-sampstrat
#| tbl-cap: "Spatial sampling strategies"

dat_ss_sampstrat <-
  tribble(~Code, ~Name,
          'A', 'Arbitrary/opportunistic',
          'D', 'Data-driven',
          'M', 'Model-based',
          'R', 'Random',
          'T', 'Transect',
          'G', 'Grid')

tbl_ss_sampstrat <- gt(dat_ss_sampstrat) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red',
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10))

tbl_ss_sampstrat
```

::: {.content-visible when-format="docx"}
delete me
:::

::: {#nte-ss-sampstrat .callout-note appearance="minimal" collapse="true" title="About sampling strategies"}
The difference between data-driven [D]{.ceg} and model-based [M]{.ceg} centres around replicability. Data-driven sites are nominated using a loose set of rules or principles (e.g. “visit the major landforms on this farm but don’t go too far off the tracks”) and may be subject to minor changes once the worker is on-ground. Model-based site choices formalise this process with reference to specific input datasets, and the worker is committed to sample at the exact point chosen. Model-based methods may include ‘backup points’ for where an intended location cannot be physically accessed, but these are also pre-determined. Importantly, model-based choices can be regenerated from the same inputs.

Arbitrary [A]{.ceg} points are not pre-determined at all; rather, they are chosen after the worker is on site, in response to direct observation of on-ground conditions. These are often practical and time-efficient but are also likely to over-sample the more easily accessible and/or interesting parts of the landscape, to the detriment of overall representativeness.

Random, transect and grid sampling [R]{.ceg}, [T]{.ceg}, [G]{.ceg} effectively ignore environmental parameters. This is useful when the spatial distribution of a parameter of interest is unknown or poorly understood.

Some of these methods can be combined - for example, data-driven transect choices with randomised locations along each. In such cases, the 'last choice' should be recorded - in this case, [R]{.ceg}. The details of the full sampling strategy should be discussed in project metadata.
:::

### Exposure type {#sec-pr-exptype}

The method by which the soil was exposed for observation affects which parameters can be recorded, or at least recorded reliably.

```{r}
#| label: tbl-pr-gear
#| tbl-cap: "Method of profile exposure"

dat_pr_gear <-
  tribble(~Code, ~Name, ~Description,
          'P1', 'Quick pit', 'Pit up to 0.5 m wide and generally no deeper than 1m',
          'P2', 'Full pit', 'Pit or trench > 1 m wide, exposing the complete profile to regolith or bedrock',
          'E', 'Exposure', 'Road cutting or similar, chipped back and cleaned',
          'C1', 'Small Core', 'Relatively undisturbed soil core, < 50 mm diameter',
          'C2', 'Large Core', 'Relatively undisturbed soil core, ≥ 50 mm diameter',
          'A', 'Auger', 'Hand or mechanical auger, including e.g. post hole diggers')

tbl_pr_gear <- gt(dat_pr_gear) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red',
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10), Name ~pct(20))

tbl_pr_gear
```

-   [A]{.ceg} samples generally cannot be used to determine soil structure due to the degree of disturbance.
-   Combined sampling is common in field mapping, e.g. small pit excavation to \~0.50-0.75 m and augering to a required depth in relatively well-understood landscapes. Record these as [P1]{.ceg}.
-   Gouge-style augers used in soft marine sediments may be included in [C2]{.ceg} even though they commonly narrow to \< 50 mm at their maximum depth.
-   [C1]{.ceg} style samplers are common in paddock-scale monitoring and research, where they are used for rapid surface measurement. These are not generally suitable for mapping as they cannot sample below the topsoil.

### Depth of observation {#sec-pr-doa}

Record the maximum depth of observation from the soil surface in whole centimetres (e.g. '[100 cm]{.ceg}'). This does not necessarily need to be recorded in the field as it can be determined from the lower depth of the last horizon in the profile description.

A full profile description ideally extends to bedrock, but this may not always be accessible with the equipment and time available. The target depth of observation for most applications in New Zealand is 100 cm. Classification systems like the NZSC and LUC rely on ruling out certain soil features within that depth range, as well as positively identifying others.

See @sec-digging for information on appropriate excavation equipment.

### Stopping early {#sec-pr-rsa}

When the target depth cannot be achieved, note the reason for stopping early using the codes in @tbl-ss-stoppage.

```{r}
#| label: tbl-ss-stoppage
#| tbl-cap: "Stoppage reason"

dat_ss_stoppage <-
  tribble(~Code, ~Name, ~Description,
          'B', 'Bedrock', 'Underlying fresh rock mass encountered, no more soil or weathering substrate to describe',
          'R', 'Resistance', 'Materials too difficult to extract with available equipment',
          'C', 'Collapse', 'Excavation unstable or unsafe due to e.g. wetness or loose particle packing',
          'I', 'Identified', 'The depth required to confirm a target parameter of interest was reached',
          'O', 'Other', 'Circumstances unrelated to the soil itself caused early stoppage')

tbl_ss_stoppage <- gt(dat_ss_stoppage) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red',
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10))

tbl_ss_stoppage
```

## Recent site disturbance {#sec-pr-dist}

Site disturbances have the potential to influence the soil as well as its cover. Noting these events can provide context for the parameters recorded, and some are especially useful when interpreting laboratory results.

Only record recent events directly affecting the site. 'Recent' is a relatively flexible concept which highlights the reasons some parameters observed might be locally unusual. Some of these events recur regularly (e.g. tillage) so only need to be noted if very recent (e.g. previous week).

If the time since the event is known with low certainty, record the general timescale involved using @tbl-ss-eventt. If the time since the event is known with high certainty, record days since the event (e.g. [FL 10]{.ceg} for a well-documented recent flood).

```{r}
#| label: tbl-ss-eventp
#| tbl-cap: "Recent site-affecting events"

dat_ss_eventp <-
  tribble(~Code, ~Name,
          'MC', 'Mechanical disturbance (e.g. tillage, deep ripping, subsoiling)',
          'AN', 'Animal disturbance (e.g. pugging, rooting, tunnelling, excretion)',
          'CL', 'Complete vegetation clearance (e.g. clearfelling)',
          'CO', 'Selective vegetation clearance (e.g. tree removal or weed clearance)', 
          'WI', 'Wind-induced treefall',
          'SL', 'Erosion event - material lost from site',
          'SD', 'Deposition event - material added to site',
          'FC', 'Fire - controlled burn',
          'FW', 'Fire - uncontrolled burn',
          'FL', 'Flooding - freshwater',
          'FS', 'Flooding - saline water (e.g. storm surge event)',
          'PO', 'Pollution event (e.g. petrol spill)',
          'NO', 'No recent events',
          'UK', 'Unknown')

tbl_ss_eventp <- gt(dat_ss_eventp) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red', 
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10))

tbl_ss_eventp
```

```{r}
#| label: tbl-ss-eventt
#| tbl-cap: "Generalised time since recent site-affecting events"

dat_ss_eventt <-
  tribble(~Code, ~Name,
            'D', 'Within the past day',
            'W', 'Within the past week',
            'M', 'Within the past month',
            'S', 'Within the past 6 months',
            'Y', 'Within the past year')

tbl_ss_eventt <- gt(dat_ss_eventt) |>
   tab_options(
    column_labels.font.weight = 'bold', 
    heading.title.font.weight = 'bold',
    table.align = 'center',
    table.width = '85%'
  ) |>
  tab_style(style = list(cell_text(color  = 'red', 
                                   weight = 'bold',
                                   align  = 'center')),
            locations = cells_body(columns = 1)) |>
  cols_width(Code ~pct(10))

tbl_ss_eventt
```