Pyarrow parquet provider #1722

barbuz · 2024-07-15T01:37:52Z

Overview

I have been using pygeoapi for my work and ended up developing a Features provider based on pyarrow and geopandas to manage parquet and geoparquet data. I am happily using it in my local instance, and I thought it could be useful to other people and projects as well, so here is my proposed contribution.

Additional information

This is my first PR on pygeoapi so please let me know if I have missed anything. I am also happy to address any comments or suggestions for this code.

Other options I have tried to manage geoparquet data are dask-geopandas and plain geopandas, but pyarrow turned out to have the most efficient and complete implementation (in fact it is used internally by both other libraries).

Dependency policy (RFC2)

I have ensured that this PR meets RFC2 requirements

Updates to public demo

I have ensured that breaking changes to the pygeoapi master demo server have been addressed
- https://github.com/geopython/demo.pygeoapi.io/blob/master/services/pygeoapi_master/local.config.yml

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

I'd like to contribute this parquet provider to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

justb4

@barbuz this is a very welcome addition! I am not (yet) very well versed in (Geo)Parquet, so some general comments/suggestions first:

documentation: add, in particular to Feature Providers: https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-features.html#providers
CRS support: see https://docs.pygeoapi.io/en/latest/crs.html . So the Parquet provider may itself support or let pygeoapi do (@crs_transform on the Provider functions query() and get() )
Q: will Provider support both Parquet and GeoParquet ? pygeoapi also supports geometry-less data (ok, I see self.has_geometry test)
Q: so source may be a local file or remote URL (with HTTP range support)?
see CRS support above: looks like result is always in WGS84.

That's it for now, don't hesitate if you have questions!

justb4 · 2024-07-19T14:57:22Z

Today we also discussed this PR at the pygeoapi PSC meeting. It was mentioned that the OGR provider also may support Parquet, but like with other providers like PostgreSQL a native/direct implementation can be more efficient, so yes, we like to go ahead.

One question was if the GeoPandas dependency is required. Background is that GeoPandas itself may have quite some dependencies. We try to avoid too many dependencies for now (builds, Docker image size).

Like said, I am not an expert on Parquet, but the question arose: is it possible to only have pyarrow as dependency, and possibly extend with geoarrow-pyarrow i.s.o. GeoPandas. For CRS transform pygeoapi already includes proj.

webb-ben · 2024-07-20T13:33:47Z

We are implementing a geopandas generic provider in cgs-earth/pygeoapi-plugins for the exact same reason you mention @justb4. OGR provider technically means you can serve CSV with all of the OAF queryables, but it can be a barrier to use for someone new to OGR. Much easier to have the same configuration as the CSV provider with a slightly more capable backend implemented.

We understood this to be out of scope in pygeoapi, hence hosting it in our public repo of pygeoapi plugins.

This seems like a hybrid between the generic geopandas provider that I mentioned above and a geoparquet specific provider. @barbuz which one is this PR supposed to be?

barbuz · 2024-07-22T06:20:29Z

Thanks for your comments! I'll try to answer to everything:

* documentation: add, in particular to Feature Providers: https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-features.html#providers

Added.

* CRS support: see https://docs.pygeoapi.io/en/latest/crs.html . So the Parquet provider may itself support or let pygeoapi do (`@crs_transform` on the Provider functions `query()` and `get()` )

geoParquet encodes the CRS of the data together with the data, so I found it more convenient to rely on that than to specify the CRS of each dataset in the pygeoapi configuration. Looking at it again I probably did it in a way that does not work well with pygeoapi's management of bbox and crs, so I have now switched to using @crs_transform and letting pigeoapi do everything

* Q: will Provider support both Parquet and GeoParquet ? pygeoapi also supports geometry-less data (ok, I see `self.has_geometry` test)

Yes, I wrote this mainly to work with GeoParquet, but also added support for geometry-less data (plain Parquet)

* Q: so `source` may be a local file or remote URL (with HTTP range support)?

Currently a local file or AWS S3 URL. That was what I specifically needed, but it should be possible to add support for HTTP (or other protocols) if there is desire for that.

* see CRS support above: looks like result is always in WGS84.

Result is now in the data's native CRS, pygeoapi takes care of transformations.

[...]is it possible to only have pyarrow as dependency, and possibly extend with geoarrow-pyarrow i.s.o. GeoPandas

I am not sure about this, it would surely need some research as I am not too familiar with geoarrow. The current state of the code basically only relies on geopandas to build the "geo_interface" to the data (i.e. the dictionary pygeoapi expects to be returned by a query or get call), so I think it should not be too hard to replace this with something else. Could you just help me understand exactly what structure should this dictionary have? I found out that this was working but I am not sure if it follows some particular standard.

This seems like a hybrid between the generic geopandas provider that I mentioned above and a geoparquet specific provider. @barbuz which one is this PR supposed to be?

This is intended to be a GeoParquet provider. It also supports plain Parquet as an edge case (a GeoParquet file without a geometry column is basically just a Parquet file), but it was never meant to be any more generic than that, surely not a generic geopandas provider.

justb4

@barbuz thanks for these answers and additions!

To answer on the query() return structure: this is basically an in-memory GeoJSON FeatureCollection, with some (optional) additions. You may look into some other providers how this is populated. For example mongo.py

        feature_collection = {
            'type': 'FeatureCollection',
            'features': featurelist, // array of Feature dict, see https://geojson.org/
            'numberMatched': matchcount, // OPTIONAL
            'numberReturned': len(featurelist) // OPTIONAL
        }

To add:

Dockerfile : add deps. As Debian packages are preferred, these would be python3-arrow and python3-geopandas

Still, the main concern is the growth in disk space of the deps: on MacOS: pyarrow about 100MB and geopandas around 105MB. This is added to the current about 306MB for all current pygeoapi deps. geoarrow adds only a couple of MB. But like you indicated more expertise is needed, basically to convert .parquet files to our internal GeoJSON structure (see above). I looked a bit around but could not find a concrete example. I ping here @bertt and @cholmes . Maybe they can be of help for GeoArrow..
Also the Docker Image (now over 1GB) will grow.

This is a generic problem that we, devteam, want to address: how to distribute pygeoapi with only required/minimal deps as needed for a user's use case...

cholmes · 2024-07-22T14:28:51Z

I ping here @bertt and @cholmes . Maybe they can be of help for GeoArrow..

I'll just help by pinging the true geoarrow experts - @paleolimbot @jorisvandenbossche @kylebarron Joris also works directly on geopandas.

They can sound in if there's some lightweight way to do things, but unfortunately parquet is often a pretty heavy.

But like you indicated more expertise is needed, basically to convert .parquet files to our internal GeoJSON structure (see above).

It might be interesting to explore using geoarrow as the internal structure instead of GeoJSON. Obviously that'd be a big change, but something to think about, as there's potentially some really interesting benefits - you'd align with the memory structure of other geospatial tools for very efficient interoperability between libraries.

kylebarron · 2024-07-22T15:31:21Z

It sounds like you need two things: to read GeoParquet and convert to GeoJSON.

pyarrow isn't enough on its own, because pyarrow has no knowledge of spatial types, and you need something to understand the geospatial nature of GeoParquet and convert to GeoJSON. You could implement something by hand using pyarrow and shapely, but that wouldn't be trivial.

pyarrow and geopandas are indeed very large dependencies. This is a large part of the reason I've been working on https://github.com/geoarrow/geoarrow-rs. Its Python bindings are 10MB total, including both the GeoParquet reader/writer and other general geospatial functionality, like GeoJSON conversion, and it has no dependencies of its own. But it's also not fully production ready just yet.

Indeed, you will lose most of the performance benefits of GeoParquet by converting the data in memory GeoJSON, but it will at least work. Using GeoArrow internally would be a big performance improvement, but would also be a big change.

tomkralidis · 2024-07-22T15:45:21Z

Parquet/GeoParquet support would definitively be valuable! I think the use case here is to have Parquet/GeoParquet as a "backend" provider to be able to emit GeoJSON. Using Parquet/GeoParquet would be an interesting experiment. Given the way pygeoapi data providers work, GeoJSON in memory is a much smaller footprint given OGC API workflow (subsetting, no bulk data transfer per se) and GeoJSON are used for query results (and are simply Python dictionaries).

Dependency wise, it's clear the GeoParquet dependencies would be optional and not affecting the core. Which is great/by design given our architecture.

barbuz · 2024-07-23T06:02:19Z

Dependency wise, it's clear the GeoParquet dependencies would be optional and not affecting the core. Which is great/by design given our architecture.

Do you mean that it could be fine to leave geopandas as a dependency for this provider?

Replacing geopandas with geoarrow is something I could try to work on, as is adding support for HTTP access to data, properly managing the CRS internally, and a number of other things... I'm just trying to understand what would be the best use of my (sadly limited) time: what is actually needed to get this PR approved?

justb4 · 2024-07-23T11:29:08Z

Thanks @cholmes and @kylebarron! If @tomkralidis agrees, I suggest the way forward for now:

go ahead with the current implementation and GeoPandas dependency, also as this is an already working (production) implementation.

Indeed through requirements-provider.txt a user can choose to install deps to the core. My main concern was the growing size of the Docker Image. Maybe we leave out the deps there. Also as the Dockerfile provides an ARG ADD_DEB_PACKAGES through which you can define a custom Image as build args without defining a new Dockerfile.

And at same time open a new issue to investigate the use of GeoArrow i.s.o. GeoPandas, plus for possibly other extensions like internal CRS transforms, HTTP Urls, CQL support etc. I would love to access Overture data via OGC API Features, makes compelling use case e.g. in QGIS!

tomkralidis · 2024-07-23T11:59:43Z

Thanks @justb4; yes, we could add to requirements-provider.txt and not in Dockerfile per se.

Having said this, on a clean virtualenv, for example, pip3 install geoarrow-pyarrow adds 222MB to a given install. pip3 install geopandas adds 271MB.

@kylebarron do ping us when geoarrow-rs is close to production ready.

webb-ben · 2024-07-23T12:31:33Z

We might need to pin geopandas or numpy if we introduce geopandas as a dependency. Geopandas <2.2.2 is only compatible with numpy<2.

Just had some issues in our geopandas provider last week managing these deps.

justb4 · 2024-07-23T13:47:59Z

Having said this, on a clean virtualenv, for example, pip3 install geoarrow-pyarrow adds 222MB to a given install. pip3 install geopandas adds 271MB.

Yes, about my figures. The "geoarrow-pyarrow adds 222MB " is mostly IMO pyarrow. As I found and @kylebarron states python-geoarrow is a few, maybe 10MB.

@barbuz think we are good to go. As last possibly can you look in maybe pinning, as @webb-ben states (meaning < Pandas < 2.2.2 I guess)? In my venv Python 3.10.12, I have numpy==2.0.0, geopandas==1.0.1, pyarrow==17.0.0, pyarrow-hotfix==0.6, pandas==2.2.2 but have not tested the PR-branch. numpy is already an existing dep for other providers, so would rather not pin numpy but I am not daily using these libs. If the PR branch works with the default versions that is ok.

webb-ben · 2024-07-23T14:23:34Z

Yes you are correct I meant pandas 2.2.2.

I am able to run on my local environment. Challenge has been getting a Docker image to build. See this workflow for reference.

Running numpy==2.0.1, geopandas==1.0.1, pandas==2.2.2 gives me this warning

START /entrypoint.sh
Trying to generate openapi.yml

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.[12](https://github.com/cgs-earth/pygeoapi-plugins/actions/runs/10060666575/job/27808835602#step:7:13)'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

and yields this error:

Traceback (most recent call last):
  File "/usr/local/bin/pygeoapi", line 33, in <module>
    sys.exit(load_entry_point('pygeoapi', 'console_scripts', 'pygeoapi')())
  File "/usr/local/bin/pygeoapi", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 2[41](https://github.com/cgs-earth/pygeoapi-plugins/actions/runs/10060666575/job/27808835602#step:7:42), in _call_with_frames_removed
  File "/pygeoapi/pygeoapi/__init__.py", line 41, in <module>
    from pygeoapi.config import config
  File "/pygeoapi/pygeoapi/config.py", line 39, in <module>
    from pygeoapi.util import to_json, yaml_load, THISDIR
  File "/pygeoapi/pygeoapi/util.py", line 54, in <module>
    from shapely import ops
  File "/usr/lib/python3/dist-packages/shapely/__init__.py", line 1, in <module>
    from shapely.lib import GEOSException  # NOQA
ImportError: numpy.core.multiarray failed to import
ERROR: openapi.yml could not be generated ERROR

kylebarron · 2024-07-23T14:52:17Z

As I found and @kylebarron states python-geoarrow is a few, maybe 10MB.

To be clear there are multiple core geoarrow libraries for Python.

geoarrow-pyarrow is developed by @paleolimbot and depends on pyarrow and his own geoarrow-c (a C-based implementation of GeoArrow). I'm working on geoarrow-rust (e.g. geoarrow-rust-core on pypi), which is entirely separate from geoarrow-c (but can interoperate between C, Rust, and Python without data copies). It's my own geoarrow-rust Python library that has a total size of 10MB (with no external dependencies), but it's not yet production ready.

justb4 · 2024-07-24T19:37:58Z

Yes you are correct I meant pandas 2.2.2.

I am able to run on my local environment. Challenge has been getting a Docker image to build. See this workflow for reference.

Running numpy==2.0.1, geopandas==1.0.1, pandas==2.2.2 gives me this warning

START /entrypoint.sh
Trying to generate openapi.yml

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.[12](https://github.com/cgs-earth/pygeoapi-plugins/actions/runs/10060666575/job/27808835602#step:7:13)'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

and yields this error:

Traceback (most recent call last):
  File "/usr/local/bin/pygeoapi", line 33, in <module>
    sys.exit(load_entry_point('pygeoapi', 'console_scripts', 'pygeoapi')())
  File "/usr/local/bin/pygeoapi", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 2[41](https://github.com/cgs-earth/pygeoapi-plugins/actions/runs/10060666575/job/27808835602#step:7:42), in _call_with_frames_removed
  File "/pygeoapi/pygeoapi/__init__.py", line 41, in <module>
    from pygeoapi.config import config
  File "/pygeoapi/pygeoapi/config.py", line 39, in <module>
    from pygeoapi.util import to_json, yaml_load, THISDIR
  File "/pygeoapi/pygeoapi/util.py", line 54, in <module>
    from shapely import ops
  File "/usr/lib/python3/dist-packages/shapely/__init__.py", line 1, in <module>
    from shapely.lib import GEOSException  # NOQA
ImportError: numpy.core.multiarray failed to import
ERROR: openapi.yml could not be generated ERROR

The above workflow run errors refer to CKAN, not sure if related.

The Docker Image can have a very different version and dep tree compared to local custom pypi installs. This has to do with the Ubuntu (Jammy) base image, Python versions, the use of UbuntuGIS (unstable), the use/preference for python-* 'deb' packages. We will try to align, but this is quite hard with all the 'native' compiled deps like GDAL and more. So what I for example see in the latest pygeoapi Docker Container (on demo server) when bashing in:

pip3 freeze | grep numpy
numpy==1.21.5

For this PR we will hold off adding to Docker, also as a new issue is underway to streamline different Docker Images for varying plugin support. This will take some deep thinking and hopefully help from the community.

As to progress this PR, if @barbuz indicates that the current PR version (unit tests) works with the current deps and requirements-provider.txt as included, we'll be happy to merge the current PR-version. We'll fix from there if needed, plus Docker inclusion.

barbuz · 2024-07-25T00:19:51Z

I can confirm that the unit tests for this provider can be passed with the current requirements-provider.txt.

However, I must say that there are tests for other providers that I could not get to pass neither in the main branch nor in this branch, likely because I am missing some dependencies (e.g. elasticsearch, mongo, oracle, postgresql). Should I try to get these to work before merging the PR or is this acceptable?

tomkralidis · 2024-07-25T00:28:22Z

I can confirm that the unit tests for this provider can be passed with the current requirements-provider.txt.

However, I must say that there are tests for other providers that I could not get to pass neither in the main branch nor in this branch, likely because I am missing some dependencies (e.g. elasticsearch, mongo, oracle, postgresql). Should I try to get these to work before merging the PR or is this acceptable?

I would worry only about the parquet tests. Our Once this PR is ready for review, we will enable CI which will run all tests.

justb4

Ok, we're good, thanks for patience @barbuz ! @tomkralidis will also do final review.

tomkralidis

Various comments/change requests -- fantastic work here!

Overall, we need to decide how to document the plugin (Parquet, GeoParquet?).

As well, see comments in the actual plugin on opening/closing dataset files (may need attention, or not).

docs/source/data-publishing/ogcapi-features.rst

pygeoapi/plugin.py

pygeoapi/provider/parquet.py

barbuz · 2024-07-26T01:56:40Z

Thank you for all your comments!

This provider can support both Parquet and GeoParquet. I am not sure what the best way to call it is; any GeoParquet file is also valid Parquet (so calling it a Parquet provider is correct), but as a geospatial API maybe we care more about the "Geo" part (so calling it GeoParquet could be better)? We could also call it a Pyarrow provider, as it mainly leverages that library to work with Parquet data. What do you think?

Unfortunately I likely won't have much time to work on this in the next two weeks because I will be travelling, if it is not a problem I will pick it up on my return.

tomkralidis · 2024-07-26T10:02:42Z

Thank you for all your comments!

This provider can support both Parquet and GeoParquet. I am not sure what the best way to call it is; any GeoParquet file is also valid Parquet (so calling it a Parquet provider is correct), but as a geospatial API maybe we care more about the "Geo" part (so calling it GeoParquet could be better)? We could also call it a Pyarrow provider, as it mainly leverages that library to work with Parquet data. What do you think?

OK great! Let's go with calling it a Parquet plugin, then. And in the plugin documentation mention that GeoParquet is supported also.

Unfortunately I likely won't have much time to work on this in the next two weeks because I will be travelling, if it is not a problem I will pick it up on my return.

No problem, will look forward to your updates in a couple of weeks.

francbartoli · 2024-07-28T11:40:01Z

Thank you for all your comments!
This provider can support both Parquet and GeoParquet. I am not sure what the best way to call it is; any GeoParquet file is also valid Parquet (so calling it a Parquet provider is correct), but as a geospatial API maybe we care more about the "Geo" part (so calling it GeoParquet could be better)? We could also call it a Pyarrow provider, as it mainly leverages that library to work with Parquet data. What do you think?

OK great! Let's go with calling it a Parquet plugin, then. And in the plugin documentation mention that GeoParquet is supported also.

Unfortunately I likely won't have much time to work on this in the next two weeks because I will be travelling, if it is not a problem I will pick it up on my return.

No problem, will look forward to your updates in a couple of weeks.

@tomkralidis wouldn't be more clear calling it the other way around so GeoParquet?

tomkralidis · 2024-07-28T15:59:32Z

@tomkralidis wouldn't be more clear calling it the other way around so GeoParquet?

Thinking more, for me, GeoParquet would mean that "geo" is required, much like our GeoJSON (JSON) plugin (which requires geometry, but can be null). We've named the PostgreSQL in this manner as well (in addition to elasticsearch, xarray, etc.).

I don't have strong opinions on this one, though. If folks feel strongly enough, let's name to GeoParquet (ensuring that the documentation mentioned that plain Parquet is also supported).

francbartoli · 2024-07-28T16:39:49Z

@tomkralidis wouldn't be more clear calling it the other way around so GeoParquet?

Thinking more, for me, GeoParquet would mean that "geo" is required, much like our GeoJSON (JSON) plugin (which requires geometry, but can be null). We've named the PostgreSQL in this manner as well (in addition to elasticsearch, xarray, etc.).

I don't have strong opinions on this one, though. If folks feel strongly enough, let's name to GeoParquet (ensuring that the documentation mentioned that plain Parquet is also supported).

I've checked GDAL and they use Parquet as the short name and (Geo)Parquetas the long one. So then, what about to keep the provider name Parquet in the code as you suggested and rename it (Geo)Parquet in the documentation page?

tomkralidis · 2024-07-28T16:41:22Z

@tomkralidis wouldn't be more clear calling it the other way around so GeoParquet?

Thinking more, for me, GeoParquet would mean that "geo" is required, much like our GeoJSON (JSON) plugin (which requires geometry, but can be null). We've named the PostgreSQL in this manner as well (in addition to elasticsearch, xarray, etc.).

I don't have strong opinions on this one, though. If folks feel strongly enough, let's name to GeoParquet (ensuring that the documentation mentioned that plain Parquet is also supported).

I've checked GDAL and they use Parquet as the short name and (Geo)Parquetas the long one. So then, what about to keep the provider name Parquet in the code as you suggested and rename it (Geo)Parquet in the documentation page?

+1

cholmes · 2024-07-29T01:53:11Z

Parquet as a name is more future proof, as our hope is that eventually geoparquet goes away and geo is just part of the core parquet spec.

…

On Sun, Jul 28, 2024 at 11:41 AM Tom Kralidis ***@***.***> wrote: @tomkralidis <https://github.com/tomkralidis> wouldn't be more clear calling it the other way around so GeoParquet? Thinking more, for me, GeoParquet would mean that "geo" is *required*, much like our GeoJSON (JSON) plugin (which requires geometry, but can be null). We've named the PostgreSQL in this manner as well (in addition to elasticsearch, xarray, etc.). I don't have strong opinions on this one, though. If folks feel strongly enough, let's name to GeoParquet (ensuring that the documentation mentioned that plain Parquet is also supported). I've checked GDAL and they use Parquet as the short name and (Geo)Parquetas the long one. So then, what about to keep the provider name Parquet in the code as you suggested and rename it (Geo)Parquet in the documentation page? +1 — Reply to this email directly, view it on GitHub <#1722 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADDL2M2YZ5FPANQQFPCWLDZOUNMPAVCNFSM6AAAAABK3UWURSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGU3TMMZVGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

justb4 · 2024-07-29T10:36:08Z

Thanks @cholmes ! Think this nails the name to parquet. Also makes sense referring to the core technology, like we have postgresql and sqlite.

We are really almost there for this PR. Remains: flake8 errors, basically reformatting:

./pygeoapi/provider/parquet.py:130:25: E127 continuation line over-indented for visual indent
./pygeoapi/provider/parquet.py:155:44: E127 continuation line over-indented for visual indent
./pygeoapi/provider/parquet.py:363:39: E127 continuation line over-indented for visual indent
./pygeoapi/provider/parquet.py:450:68: W291 trailing whitespace

Tip: there is a pre-commit hook: https://github.com/geopython/pygeoapi/blob/master/.pre-commit-config.yaml or when requirements-dev.txt is installed, running flake8 command in the project root.

justb4

Maybe we overlooked but the Provider class name in plugin.py reads: 'pygeoapi.provider.parquet.parquet.ParquetProvider' . Think one .parquet too many (?).

pygeoapi/plugin.py

barbuz · 2024-08-19T01:19:14Z

Sorry for the delay, I've just pushed some changes to address your reviews

Pyarrow parquet provider

58b0a1a

tomkralidis added the enhancement New feature or request label Jul 15, 2024

tomkralidis self-requested a review July 15, 2024 02:39

justb4 assigned barbuz Jul 19, 2024

justb4 self-requested a review July 19, 2024 10:08

justb4 reviewed Jul 19, 2024

View reviewed changes

barbuz added 3 commits July 22, 2024 15:21

Defer crs management to pygeoapi

a12051c

Add parquet provider docs

b2fc685

Fix flake8 errors

3a3b52b

barbuz requested a review from justb4 July 22, 2024 06:20

justb4 reviewed Jul 22, 2024

View reviewed changes

justb4 approved these changes Jul 25, 2024

View reviewed changes

tomkralidis requested changes Jul 25, 2024

View reviewed changes

justb4 requested changes Aug 16, 2024

View reviewed changes

pygeoapi/plugin.py Outdated Show resolved Hide resolved

barbuz added 2 commits August 19, 2024 10:41

Remove extra .parquet

df1dac3

Address reviews

9c7641f

barbuz requested review from tomkralidis and justb4 August 19, 2024 01:18

tomkralidis added this to the 0.18.0 milestone Aug 19, 2024

tomkralidis merged commit 54b9be4 into geopython:master Aug 19, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyarrow parquet provider #1722

Pyarrow parquet provider #1722

barbuz commented Jul 15, 2024

justb4 left a comment

justb4 commented Jul 19, 2024

webb-ben commented Jul 20, 2024

barbuz commented Jul 22, 2024

justb4 left a comment

cholmes commented Jul 22, 2024

kylebarron commented Jul 22, 2024

tomkralidis commented Jul 22, 2024

barbuz commented Jul 23, 2024

justb4 commented Jul 23, 2024

tomkralidis commented Jul 23, 2024

webb-ben commented Jul 23, 2024

justb4 commented Jul 23, 2024

webb-ben commented Jul 23, 2024

kylebarron commented Jul 23, 2024

justb4 commented Jul 24, 2024

barbuz commented Jul 25, 2024

tomkralidis commented Jul 25, 2024

justb4 left a comment

tomkralidis left a comment

barbuz commented Jul 26, 2024

tomkralidis commented Jul 26, 2024

francbartoli commented Jul 28, 2024

tomkralidis commented Jul 28, 2024

francbartoli commented Jul 28, 2024 •

edited

Loading

tomkralidis commented Jul 28, 2024

cholmes commented Jul 29, 2024 via email

justb4 commented Jul 29, 2024

justb4 left a comment

barbuz commented Aug 19, 2024

Pyarrow parquet provider #1722

Pyarrow parquet provider #1722

Conversation

barbuz commented Jul 15, 2024

Overview

Additional information

Dependency policy (RFC2)

Updates to public demo

Contributions and licensing

justb4 left a comment

Choose a reason for hiding this comment

justb4 commented Jul 19, 2024

webb-ben commented Jul 20, 2024

barbuz commented Jul 22, 2024

justb4 left a comment

Choose a reason for hiding this comment

cholmes commented Jul 22, 2024

kylebarron commented Jul 22, 2024

tomkralidis commented Jul 22, 2024

barbuz commented Jul 23, 2024

justb4 commented Jul 23, 2024

tomkralidis commented Jul 23, 2024

webb-ben commented Jul 23, 2024

justb4 commented Jul 23, 2024

webb-ben commented Jul 23, 2024

kylebarron commented Jul 23, 2024

justb4 commented Jul 24, 2024

barbuz commented Jul 25, 2024

tomkralidis commented Jul 25, 2024

justb4 left a comment

Choose a reason for hiding this comment

tomkralidis left a comment

Choose a reason for hiding this comment

barbuz commented Jul 26, 2024

tomkralidis commented Jul 26, 2024

francbartoli commented Jul 28, 2024

tomkralidis commented Jul 28, 2024

francbartoli commented Jul 28, 2024 • edited Loading

tomkralidis commented Jul 28, 2024

cholmes commented Jul 29, 2024 via email

justb4 commented Jul 29, 2024

justb4 left a comment

Choose a reason for hiding this comment

barbuz commented Aug 19, 2024

francbartoli commented Jul 28, 2024 •

edited

Loading