Point in Polygon, etc... #462
Replies: 1 comment 1 reply
-
Hi Ned!
Using
Yeah, for filtering purposes you could use That said, if we add a
One thing I've realized during my time working on DuckDB is that DuckDB really shines by being able to execute simpler operations on a large set of data in a hot-loop (the whole "vectorized execution" thing). But if you're going to call into an external library on a row-by-row basis that in turn will dominate the runtime with a lot of computation heavy code (as is usually the case in routing-calculations), you're not really gaining a lot by integrating into DuckDB, except for the convenience in being able to interact with it through SQL. Im not completely against adding new dependencies, but because we want to statically link everything and ship binaries for 10+ platforms, larger dependencies need to be worth their build and maintenance burden. Additionally, spatial is already pretty large and will ideally stick to C++11. Perhaps a Valhalla integration would fit better as a separate extension? Now, to your questions:
I guess it depends. We generally try to follow along the tradition of PostGIS/Spatialite and the whole SQL/MM OGC world of geospatial SQL functionality, but I also don't want to be too stuck in the past. We're happy to add extra stuff that people find useful as well, but ideally it is implemented natively or makes use of the existing dependencies. If you have think you have a really unusual idea for a new function that no other database has, It's a good idea to first open a discussion so we can see how it would interact with the existing concepts and functionality in the spatial extension.
Cool! Could you elaborate on how this index works differently than e.g. a R-Tree? Can it be serialized/deserialized/"paged" efficiently to/from disk? Do you have some code or a paper you could share? Indexing in DuckDB is still very immature and limited, and Indexes in extensions doubly so. There is as of now no API for creating custom indexes, managing storage or having them be used by the query planner, and hacking around it like I've done in spatial requires significant knowledge of DuckDB (unstable) internals. Thankfully using indexes in DuckDB is not as much of a "requirement" as it often is in other databases, and it can often produce worse query times depending on the size and access patterns of your workloads, see e.g. #444 or the section on indexes in the docs. So in general I would not recommend attempting to implement a custom DuckDB index type yet.
I myself is very much from the spatial-SQL world, so I'd be very happy to hear more of your thoughts on spatial and what issues/problems/features you think are important to solve! |
Beta Was this translation helpful? Give feedback.
-
I am trying to evaluate the usefulness for DuckDb spatial for business purposes. For reference I am the original designer/author of Alteryx (not affiliated any more) so I have my own way of thinking about spatial that differs from how DBs have done it.
A really common thing I want to do is Point in Polygon. Using DuckDB I can do Contains or Intersects, but they both have issues with points that are on the boundary. When there is a continuous layer (like ZIP codes) I want each point to be in 1 and only 1 polygon (deterministically). With Contains, a point on the edge is in 0 polygons and with Intersects a point on a boundary will be in 2 polygons. I am not seeing an option to be in 1 and only 1.
There are other missing functions like creating a radius (or buffer) with an actual earth based distance instead of having to change projections. Also I'd love to integrate things like Valhalla for drivetime polygon creation so that a join can be done on a drivetime natively.
So presuming I am seeing this correctly:
Is there interest in adding other spatial functions that would be on the more unique side of things? Or should I create a plugin just for my use cases.
I have a design for a spatial index for a point only layer that outperforms more generic 2D indexes by an order of magnitude - is there any interest in integrating that? If so I would probably need some help in figuring out the DuckDB way of doing things.
Beta Was this translation helpful? Give feedback.
All reactions