-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
155 lines (114 loc) · 5.61 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# spress
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/spress)](https://CRAN.R-project.org/package=spress)
<!-- badges: end -->
**spress** provides utilities for encoding and compressing geospatial objects, such as `sf` objects.
## Installation
This package requires **C++11** for compilation. Once you have that installed,
you can install the development version from [GitHub](https://github.com/)
with `devtools` or `pak`:
``` r
# pak
pak::pkg_install("UFOKN/spress")
# devtools
devtools::install_github("UFOKN/spress")
```
## Usage
**spress** offers *both* encoding and compressing utilities. For clarity, in the context of **spress**,
*encoding* refers to projecting 2-dimensional coordinates to a 1-dimensional system
(particularly, geographic coordinates to a Hilbert Curve index); whereas, *compressing* refers to
the process for reducing the object and file size of the encoded object.
### Encoding
At a high level, **spress** performs the following algorithm for encoding:
a. Given a set of points *P*
b. Retrieve *P*'s extent/bounding box
c. Create a *2^n* x *2^n* grid with centroids aligned to the extent's corner points
d. Index the grid to a Hilbert Curve
e. Assign each point in *P* to the closest index on the Hilbert Curve
A visualization of this algorithm is shown below:
![](man/figures/process.png)
After encoding, the data necessary to be stored is reduced down to:
1. The set of points *P*'s **extent**;
2. The ***n*** value used;
3. and, the list of **indices**.
With this set of data, the decoding process is exactly the reverse of the encoding algorithm.
In the visual example above, an *n* of **4** was used (since *2^4 = 16*, and we have a 16x16 grid),
however, note that the maximum possible error *E* converges toward 0 as the *extent* becomes smaller,
and as *n* grows. Thus, we can achieve a much more accurate encoding scheme by modifying these when possible.
> Realistically, this algorithm is computationally expensive if the **extent** and ***n***
> are large. However, this is mitigated via bitwise operations and taking advantage of efficient C++ structures.
> For the specifics of this, see `libspate`'s source code in `inst/include/spate.hpp`.
#### Encoding Example
```{r}
# Random set of points
P <- data.frame(
X = c(-77.8015434499876,
-77.8707250008617,
-77.9212227055911,
-77.9385055000025,
-77.8452515318622,
-77.8715886279598),
Y = c(34.2961615000021,
34.1818796104794,
34.1447587911948,
34.2165775000314,
34.2218266693184,
34.2632856737062)
)
# n = 8 --> 2^8 x 2^8 grid --> 256x256 grid
n <- 8L
# Encode using vectors
spress::encode(P$X, P$Y, n = n)
# We can also use the data.frame directly and attach to it
encoded_P <- spress::encode(P, coords = c("X", "Y"), attach = FALSE)
encoded_df <- spress::encode(P, coords = c("X", "Y"), attach = TRUE)
encoded_df
```
#### Decoding Example
We will use `encoded_P` and `encoded_df` from the encoding example
If we decode an object that used `spress::encode` to encode, there's
no need to pass `extent` or `n` to the decoding function. These values
are stored in the attributes `spress_extent` and `spress_n`, respectively.
```{r}
spress::decode(encoded_P)
# Columns `x` and `y` are the new decoded coordinate columns
# The argument `index` refers to the column index of the encoded index values
spress::decode(encoded_df, index = 3)[, c("x", "y")]
```
### Compression
> Compression is **not** implement yet as of 11/30/21.
For compression, **spress** performs the following algorithm:
1. Given a set of points *P*
2. Encode *P* based on the encoding algorithm
3. Create an ordering index for order preservation upon decompression
4. Sort indices by the encoding index, such that each point in *P* follows along corresponding Hilbert Curve
5. Keep the first encoding index and get the difference between every successive point
This algorithm returns a set of indices that (generally) should be small integers.
The returned table can then be written to any file format, preferably a columnar format
(such as [Apache Parquet](https://parquet.apache.org/)),
and additionally compressed using another compression library
(such as [xz](https://tukaani.org/xz/) or [snappy](https://github.com/google/snappy)).
## C++ Library
Included in this package is the header-only C++ library `libspress`, which contains the foundational code used for encoding/decoding,
and was developed specifically for this package.
This can be found at `inst/include/spress.hpp` from the root of this repository.
Unit testing for this library is done with [Catch](https://github.com/catchorg/Catch2) via `testthat`.
See [`testthat::use_catch`](https://testthat.r-lib.org/reference/use_catch.html) for details on how this works.
> Note that the C++ code for this package contains purely the algorithm for encoding and decoding. The current roadmap for this package
> includes making the C++ library fully vendorable.
## Attribution
This package was inspired by the paper:
Liu, Dongge, Tao Wang, Xiaojuan Li, Yeqing Ni, Yanping Li, and Zhao Jin. 2020. "A Multiresolution Vector Data Compression Algorithm Based on Space Division" ISPRS International Journal of Geo-Information 9, no. 12: 721. https://doi.org/10.3390/ijgi9120721