Skip to content

Commit dcac444

Browse files
authored
Geopoly in SQLite
1 parent 76fe028 commit dcac444

File tree

1 file changed

+142
-0
lines changed

1 file changed

+142
-0
lines changed

sqlite/geopoly.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Geopoly in SQLite
2+
3+
I noticed this morning that one of my Datasette installations had the [Geopoly](https://www.sqlite.org/geopoly.html) SQLite extension enabled. I don't know how it got there - it has to be compiled specifically - but since it was there I decided to try it out.
4+
5+
The clue that it was enabled was seeing `ENABLE_GEOPOLY` listed in compile options on the `/-/versions` page.
6+
7+
## Importing some raw GeoJSON data
8+
9+
Geopoly supports "a small subset of GeoJSON". It can only handle polygons, and they have to be "simple" polygons for which the boundary does not intersect itself.
10+
11+
It took me a bit of work to populate a database table. I started by exporting GeoJSON for North and South America from this page:
12+
13+
https://geojson-maps.ash.ms/
14+
15+
Here's [the data I exported as a Gist](https://gist.github.com/simonw/683bf1ff7403796ae49b70be8e202e7e).
16+
17+
To load that into SQLite I started by installing the [datasette-write](https://datasette.io/plugins/datasette-write) plugin and starting Datasette as root:
18+
19+
```
20+
datasette install datasette-write
21+
datasette --create data.db --root -o
22+
```
23+
Then I visited `http://localhost:8001/-/write` and created a `raw_data` table and inserted the GeoJSON:
24+
```
25+
create table raw_data (text);
26+
```
27+
I had to replace any single `'` characters with `''` in the JSON to escape them, then I pasted in the following:
28+
```
29+
insert into raw_data (text) values ('{
30+
"type": "FeatureCollection",
31+
"features": [
32+
{
33+
"type": "Feature",...');
34+
```
35+
This gave me a populated `raw_data` table with a blob of JSON I could start playing with.
36+
37+
## Converting that to Geopoly shapes
38+
39+
This next bit was really tricky.
40+
41+
I started by creating a new table for my countries:
42+
43+
```sql
44+
create virtual table countries using geopoly(name, properties);
45+
```
46+
47+
My GeoJSON data contained two types of object: polygons and multipolygons.
48+
49+
Neither of these can be inserted directly into a Geopoly table.
50+
51+
GeoJSON Polygons exist as a list, where the first item in the list is the shape and subsequent items are "holes" within that shape (lakes and suchlike).
52+
53+
Multipolygons are a list of polygons, for example the USA has separate polygons for Alaska and Hawaii and more.
54+
55+
Geopoly can only handle a simple polygon - so I needed to pick a single polygon for each of my countries.
56+
57+
After some experimenting, I came up with the following write query to populate the table with valid polygons:
58+
59+
```sql
60+
insert into countries (name, properties, _shape)
61+
select
62+
json_extract(value, '$.properties.name') as name,
63+
json_extract(value, '$.properties') as properties,
64+
json_extract(value, '$.geometry.coordinates[0]') as _shape
65+
from json_each(text, '$.features'), raw_data
66+
where geopoly_blob(_shape) is not null
67+
```
68+
There are a few tricks going on here.
69+
70+
- `from json_each(text, '$.features'), raw_data` loops through every feature in the `$.features` list inside the `text` field in that `raw_data` table I created earlier. The join there is a little unintuitive but it works for processing JSON in a table with a single row. Each item in that array is available as `value` in the rest of the query.
71+
- `json_extract(value, '$.properties.name')` extracts the `name` property from the `properties` object inside the GeoJSON feature - this is the name of the country.
72+
- `json_extract(value, '$.properties')` extracts the full `properties` object as a JSON string.
73+
- `json_extract(value, '$.geometry.coordinates[0]')` extracts the first item from the `coordinates` array inside the GeoJSON geometry object. As discussed earlier, I decided to ignore the "holes" in each polygon and just store the outer-most shape.
74+
- `where geopoly_blob(_shape) is not null` is a trick I found to filter out invalid polygons - without this the entire query failed with an unhelpful SQL logic error.
75+
76+
This query turned out to do the right thing for all of the countries with a single polygon - but it skipped countries like the USA and Canada which used a multipolygon.
77+
78+
I eventually figured out I could import those countries as well with a second query:
79+
80+
```sql
81+
insert into countries (name, properties, _shape)
82+
select
83+
json_extract(value, '$.properties.name') as name,
84+
json_extract(value, '$.properties') as properties,
85+
json_extract(value, '$.geometry.coordinates[0][0]') as _shape
86+
from json_each(text, '$.features'), raw_data
87+
where json_extract(value, '$.geometry.type') = 'MultiPolygon';
88+
```
89+
Here I'm using `$.geometry.coordinates[0][0]` to extract that outer shape from the first polygon in the multipolygon.
90+
91+
This worked! Having run the above two queries I had a fully populated `countries` table with 31 rows.
92+
93+
## What country is this point in?
94+
95+
I can now use the `geopoly_contains_point()` function to find which country a point is in:
96+
97+
```sql
98+
select
99+
name, properties
100+
from
101+
countries
102+
where
103+
geopoly_contains_point(_shape, -84.1067362, 9.9314029)
104+
```
105+
The latitude and longitude of San José, Costa Rica is 9.9314029, -84.1067362. This query returns Costa Rica.
106+
107+
## Plotting everything on a map
108+
109+
I installed the [datasette-geojson-map](https://datasette.io/plugins/datasette-geojson-map) plugin by Chris Amico and used it to plot my countries on a map:
110+
111+
```
112+
datasette install datasette-geojson-map
113+
```
114+
Then in the SQL interface:
115+
```sql
116+
select
117+
name,
118+
'{"coordinates": ['
119+
|| geopoly_json(_shape) ||
120+
'], "type": "Polygon"}' as geometry
121+
from
122+
countries
123+
```
124+
Here I'm using the `geopoly_json(_shape)` function to turn the binary representation of the shape in the database back into a GeoJSON polygon. Then I'm concatenating that with a little bit of wrapping JSON to create a full GeoJSON feature that looks like this (coordinates truncated):
125+
126+
```json
127+
{"coordinates": [[[-82.5462,9.56613],[-82.9329,9.47681]]], "type": "Polygon"}
128+
```
129+
130+
The plugin renders any GeoJSON in a column called `geometry`. The output looked like this:
131+
132+
<img width="765" alt="Screenshot showing that SQL query, with a map displayed below it with the outlines of all 31 countries. Argentina is notably missing." src="https://user-images.githubusercontent.com/9599/210618239-4c14bf8a-6c83-4220-b86a-047b5ea01c1d.png">
133+
134+
I just spotted that Argentina is missing from that map, presumably because it's a multipolygon and the first polygon in that sequence isn't the largest polygon covering that country.
135+
136+
## Could this handle more complex polygons?
137+
138+
The obvious problem with this approach is that I've over-simplified the polygons: the USA is missing two whole states and a bunch of territories, and other countries have been simplified in similar ways.
139+
140+
I think there's a reasonable way to handle this, with a bit more work. The trick would be to represent each country as multiple rows in the database, each row corresponding to one of the polygons in the multipolygon.
141+
142+
This could even be extended to handle holes in regular polygons, by running queries that can consider multiple rows and identify if a point falls within a hole and hence should not be considered as part of a country.

0 commit comments

Comments
 (0)