|
| 1 | +# Geopoly in SQLite |
| 2 | + |
| 3 | +I noticed this morning that one of my Datasette installations had the [Geopoly](https://www.sqlite.org/geopoly.html) SQLite extension enabled. I don't know how it got there - it has to be compiled specifically - but since it was there I decided to try it out. |
| 4 | + |
| 5 | +The clue that it was enabled was seeing `ENABLE_GEOPOLY` listed in compile options on the `/-/versions` page. |
| 6 | + |
| 7 | +## Importing some raw GeoJSON data |
| 8 | + |
| 9 | +Geopoly supports "a small subset of GeoJSON". It can only handle polygons, and they have to be "simple" polygons for which the boundary does not intersect itself. |
| 10 | + |
| 11 | +It took me a bit of work to populate a database table. I started by exporting GeoJSON for North and South America from this page: |
| 12 | + |
| 13 | +https://geojson-maps.ash.ms/ |
| 14 | + |
| 15 | +Here's [the data I exported as a Gist](https://gist.github.com/simonw/683bf1ff7403796ae49b70be8e202e7e). |
| 16 | + |
| 17 | +To load that into SQLite I started by installing the [datasette-write](https://datasette.io/plugins/datasette-write) plugin and starting Datasette as root: |
| 18 | + |
| 19 | +``` |
| 20 | +datasette install datasette-write |
| 21 | +datasette --create data.db --root -o |
| 22 | +``` |
| 23 | +Then I visited `http://localhost:8001/-/write` and created a `raw_data` table and inserted the GeoJSON: |
| 24 | +``` |
| 25 | +create table raw_data (text); |
| 26 | +``` |
| 27 | +I had to replace any single `'` characters with `''` in the JSON to escape them, then I pasted in the following: |
| 28 | +``` |
| 29 | +insert into raw_data (text) values ('{ |
| 30 | + "type": "FeatureCollection", |
| 31 | + "features": [ |
| 32 | + { |
| 33 | + "type": "Feature",...'); |
| 34 | +``` |
| 35 | +This gave me a populated `raw_data` table with a blob of JSON I could start playing with. |
| 36 | + |
| 37 | +## Converting that to Geopoly shapes |
| 38 | + |
| 39 | +This next bit was really tricky. |
| 40 | + |
| 41 | +I started by creating a new table for my countries: |
| 42 | + |
| 43 | +```sql |
| 44 | +create virtual table countries using geopoly(name, properties); |
| 45 | +``` |
| 46 | + |
| 47 | +My GeoJSON data contained two types of object: polygons and multipolygons. |
| 48 | + |
| 49 | +Neither of these can be inserted directly into a Geopoly table. |
| 50 | + |
| 51 | +GeoJSON Polygons exist as a list, where the first item in the list is the shape and subsequent items are "holes" within that shape (lakes and suchlike). |
| 52 | + |
| 53 | +Multipolygons are a list of polygons, for example the USA has separate polygons for Alaska and Hawaii and more. |
| 54 | + |
| 55 | +Geopoly can only handle a simple polygon - so I needed to pick a single polygon for each of my countries. |
| 56 | + |
| 57 | +After some experimenting, I came up with the following write query to populate the table with valid polygons: |
| 58 | + |
| 59 | +```sql |
| 60 | +insert into countries (name, properties, _shape) |
| 61 | +select |
| 62 | + json_extract(value, '$.properties.name') as name, |
| 63 | + json_extract(value, '$.properties') as properties, |
| 64 | + json_extract(value, '$.geometry.coordinates[0]') as _shape |
| 65 | +from json_each(text, '$.features'), raw_data |
| 66 | +where geopoly_blob(_shape) is not null |
| 67 | +``` |
| 68 | +There are a few tricks going on here. |
| 69 | + |
| 70 | +- `from json_each(text, '$.features'), raw_data` loops through every feature in the `$.features` list inside the `text` field in that `raw_data` table I created earlier. The join there is a little unintuitive but it works for processing JSON in a table with a single row. Each item in that array is available as `value` in the rest of the query. |
| 71 | +- `json_extract(value, '$.properties.name')` extracts the `name` property from the `properties` object inside the GeoJSON feature - this is the name of the country. |
| 72 | +- `json_extract(value, '$.properties')` extracts the full `properties` object as a JSON string. |
| 73 | +- `json_extract(value, '$.geometry.coordinates[0]')` extracts the first item from the `coordinates` array inside the GeoJSON geometry object. As discussed earlier, I decided to ignore the "holes" in each polygon and just store the outer-most shape. |
| 74 | +- `where geopoly_blob(_shape) is not null` is a trick I found to filter out invalid polygons - without this the entire query failed with an unhelpful SQL logic error. |
| 75 | + |
| 76 | +This query turned out to do the right thing for all of the countries with a single polygon - but it skipped countries like the USA and Canada which used a multipolygon. |
| 77 | + |
| 78 | +I eventually figured out I could import those countries as well with a second query: |
| 79 | + |
| 80 | +```sql |
| 81 | +insert into countries (name, properties, _shape) |
| 82 | +select |
| 83 | + json_extract(value, '$.properties.name') as name, |
| 84 | + json_extract(value, '$.properties') as properties, |
| 85 | + json_extract(value, '$.geometry.coordinates[0][0]') as _shape |
| 86 | +from json_each(text, '$.features'), raw_data |
| 87 | + where json_extract(value, '$.geometry.type') = 'MultiPolygon'; |
| 88 | +``` |
| 89 | +Here I'm using `$.geometry.coordinates[0][0]` to extract that outer shape from the first polygon in the multipolygon. |
| 90 | + |
| 91 | +This worked! Having run the above two queries I had a fully populated `countries` table with 31 rows. |
| 92 | + |
| 93 | +## What country is this point in? |
| 94 | + |
| 95 | +I can now use the `geopoly_contains_point()` function to find which country a point is in: |
| 96 | + |
| 97 | +```sql |
| 98 | +select |
| 99 | + name, properties |
| 100 | +from |
| 101 | + countries |
| 102 | +where |
| 103 | + geopoly_contains_point(_shape, -84.1067362, 9.9314029) |
| 104 | +``` |
| 105 | +The latitude and longitude of San José, Costa Rica is 9.9314029, -84.1067362. This query returns Costa Rica. |
| 106 | + |
| 107 | +## Plotting everything on a map |
| 108 | + |
| 109 | +I installed the [datasette-geojson-map](https://datasette.io/plugins/datasette-geojson-map) plugin by Chris Amico and used it to plot my countries on a map: |
| 110 | + |
| 111 | +``` |
| 112 | +datasette install datasette-geojson-map |
| 113 | +``` |
| 114 | +Then in the SQL interface: |
| 115 | +```sql |
| 116 | +select |
| 117 | + name, |
| 118 | + '{"coordinates": [' |
| 119 | + || geopoly_json(_shape) || |
| 120 | + '], "type": "Polygon"}' as geometry |
| 121 | +from |
| 122 | + countries |
| 123 | +``` |
| 124 | +Here I'm using the `geopoly_json(_shape)` function to turn the binary representation of the shape in the database back into a GeoJSON polygon. Then I'm concatenating that with a little bit of wrapping JSON to create a full GeoJSON feature that looks like this (coordinates truncated): |
| 125 | + |
| 126 | +```json |
| 127 | +{"coordinates": [[[-82.5462,9.56613],[-82.9329,9.47681]]], "type": "Polygon"} |
| 128 | +``` |
| 129 | + |
| 130 | +The plugin renders any GeoJSON in a column called `geometry`. The output looked like this: |
| 131 | + |
| 132 | +<img width="765" alt="Screenshot showing that SQL query, with a map displayed below it with the outlines of all 31 countries. Argentina is notably missing." src="https://user-images.githubusercontent.com/9599/210618239-4c14bf8a-6c83-4220-b86a-047b5ea01c1d.png"> |
| 133 | + |
| 134 | +I just spotted that Argentina is missing from that map, presumably because it's a multipolygon and the first polygon in that sequence isn't the largest polygon covering that country. |
| 135 | + |
| 136 | +## Could this handle more complex polygons? |
| 137 | + |
| 138 | +The obvious problem with this approach is that I've over-simplified the polygons: the USA is missing two whole states and a bunch of territories, and other countries have been simplified in similar ways. |
| 139 | + |
| 140 | +I think there's a reasonable way to handle this, with a bit more work. The trick would be to represent each country as multiple rows in the database, each row corresponding to one of the polygons in the multipolygon. |
| 141 | + |
| 142 | +This could even be extended to handle holes in regular polygons, by running queries that can consider multiple rows and identify if a point falls within a hole and hence should not be considered as part of a country. |
0 commit comments