Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint / landingpage per dataset #116

Open
AntonHardock opened this issue Jul 1, 2022 · 8 comments
Open

Add endpoint / landingpage per dataset #116

AntonHardock opened this issue Jul 1, 2022 · 8 comments
Labels
design Design suggestion

Comments

@AntonHardock
Copy link

AntonHardock commented Jul 1, 2022

Dear CrunchyData Team,

Are there any plans to change the API structure such that each dataset gets its own endpoint and landingpage?

A similar issue exists. However, I'm not sure if my request fully aligns.

Background
The OGC API Features Spec allows or even requires the grouping of collections by datasets. References:

Proposal
Currently, pg_featureserv exposes all spatial tables across all accessed schemas at the collections level:
/collections/schemaA.table1
/collections/schemaB.table1

...

With "datasets" grouping, each dataset would have its own JSON+HTML representation (potentially filled with some metadata). Likewise, each dataset/collections page would list available collections, followed by the actual collection ids:
/datasetA/collections/collection1
/datasetB/collections/collection1

...

One solution is to equate each schema with one dataset.
Pg_featureserv could then adjust the API sructure to:
/schemaA/collections/table1
/schemaB/collections/table1

...

Motivation and Use Case
I work at the Agency for Geoinformation and Surveying in Hamburg, Germany.
Currently, we evaluate Software to expand our OGC API repertroire.
Our Urban Data Platform offers OAF Pt.1+2. It is implemented in a monolithic OGC Suite (alongside WFS, WMS and so on)

As we gradually move the Platform to a Cloud environment, pg_featureserv becomes a very exciting alternative. However, we need to link OAF endpoints of individual datasets with their corresponding entries in a metadata catalogue. The latter serves as "dataset homepage", providing links to all APIs from which the dataset is accessible. Linking to multiple collections would be impractical. Since we offer 300+ datasets with 1000+ collections, navigating that would overwhelm end users.

I'm looking forward to hear your thoughts on this.
Best Regards,

Anton

@AntonHardock AntonHardock changed the title Add Endpoint / Landingpage per dataset Add endpoint / landingpage per dataset Jul 1, 2022
@dr-jts dr-jts added the design Design suggestion label Jul 11, 2022
@dr-jts
Copy link
Collaborator

dr-jts commented Jul 11, 2022

This is an interesting direction to think about. As you noticed in #32, we have had some tentative thoughts about introducing an organization level between "service" and "collection". it sounds like you have exactly this need.

Can you provide more detail of your data model/schema? Is it the case that your "datasets" with multiple collections map to database schemas and the tables within them?

The Features Core standard states:

Additional capabilities that address more advanced needs will be specified in additional parts. Examples include support for creating and modifying ... multiple datasets and collection hierarchies.

So it looks like the standard has not yet be extended to handle "multiple datasets", correct? It would be very preferable to follow the OGC lead on this, since it is a very small design space, and there is a large risk of making the wrong choice of direction, winding up out of alignment with the standard, and then having to change design with an impact on current users.

Would an alternative (in short or long term) be to implement a thin front-end which can map your desired URL structure into requests which are supported by pg_featureserv? It could also deal with providing dataset-level metadata.

@AntonHardock
Copy link
Author

AntonHardock commented Jul 14, 2022

Our data model follows this pattern: For each dataset we receive from our customers, a new postgres schema is created. Often, the dataset is split into multiple tables/views. Then, each dataset is published as individual OAF endpoint, where tables/views are mapped as collections of that endpoint. The same structure is reused for other API Types (like WFS and WMS)

In our implementation, we follow the example of ldproxy.
The software is an OGC reference implementation for Features Core.
Here's an example:
https://demo.ldproxy.net --> lists available datasets
https://demo.ldproxy.net/daraa --> example dataset from the OGC Testbed-15
https://demo.ldproxy.net/daraa/collections --> list of available collections of dataset "daraa"
https://demo.ldproxy.net/daraa/collections/IndustrySrf --> example collection: industry surfaces in Daraa

Nonetheless, as you point out correctly: Features Core explicitly does not cover how to handle multiple datasets.
Standardization begins at the collections level of one dataset. At the same time, an extension is not discouraged:

Other parts of this standard may define API extensions that support multiple datasets. The statement that the features are from "a dataset" is not meant to preclude such extensions. It just reflects that this document does not specify how the API publishes features or other spatial data from multiple datasets.

The current outline for potential OAF Parts to follow does not seem to include such an extension.
Looking at ldproxy, one could argue that there is no need for that.
Listing available datasets at the entrypoint and providing a landing page for each seems to be a straight forward solution.
I think that this is in line with your proposal of a "thin front-end".

Would it be possible to add such a front end through an optional "multi-dataset mode"?
If users want to publish multiple datasets through one instance of pg_featureserv, the software could

  • treat each schema as separate dataset, as proposed initially
  • use an extra config file mapping the desired URL structure to collections requests
  • pull that extra mapping from another postgres table, alongside with metadata (like dataset descriptions)

@dr-jts
Copy link
Collaborator

dr-jts commented Jul 18, 2022

Would it be possible to add such a front end through an optional "multi-dataset mode"?

Yes, this is certainly possible. It makes sense to make this an option controlled by a config parameter.

The request structure you suggest (schema/collections/collname) makes good sense. I've updated #32 to reflect this.

@dr-jts
Copy link
Collaborator

dr-jts commented Jul 18, 2022

the software could use an extra config file mapping the desired URL structure to collections requests

I'm not clear what this means, or why an extra config file is needed? isn't the URL structure mentioned above sufficient?

  • pull that extra mapping from another postgres table, alongside with metadata (like dataset descriptions)

I can see it might be useful to have more metadata in the database for use in the UI. But up to now we have avoided adding metadata tables in the database. Could the metadata UI be provided by an external service, with pg_featureserv just providing the UI as it exists currently (for each dataset/schema)?

@AntonHardock
Copy link
Author

AntonHardock commented Jul 21, 2022

The request structure you suggest (schema/collections/collname) makes good sense. I've updated #32 to reflect this.

Thank you very much! As to your questions: I realize the following bit was misleading:

If users want to publish multiple datasets through one instance of pg_featureserv, the software could

  • treat each schema as separate dataset, as proposed initially
  • use an extra config file mapping the desired URL structure to collections requests
  • pull that extra mapping from another postgres table, alongside with metadata (like dataset descriptions)

Each bullet point is supposed to represent an option towards the same goal. The first suggestion (schema/collections/collname) is perfectly sufficient. In fact, I'd consider it the best option as it suits the (almost) zero-configuration nature of pg_featureserv. However, given the usecase, extending that basic idea might still be nesseccary.

First, let me outline that usecase further:

  • one large Postgres-DB (or Cluster)
  • hundreds, potentially thousands of publicly available datasets (1 dataset = 1 schema)
  • one or more identical instances of pg_featureserv, all connected to that database
  • Searching for any kind of Geodata from Hamburg, users would typically land on a metadata-catalogue, similar to https://data.gov/
  • when browsing to a particular dataset, they should find all linked ressources, including OAF
  • clicking on the OAF endpoint (e.g. oaf_baseurl/all_schemas/schema1/), users should land on some OAF representation of that dataset (ideally html and json)

The question then becomes: What should be presented on that OAF endpoint?

A: Just a plain list of available collection links
That would certainly do. After all, the metadata-catalogue already serves as a "dataset homepage". It thereby fulfills the role of an external metadata UI.

B: list collection links + optional links and metadata
Following linked data principles, linking back to the metadata catalogue seems ideal.
Rendering that link by pg_featureserv, along with available collections, seems straight-forward. Instead of additional config files or cluttering the global config, one could map schemas and links in an extra config table. While at it, extra metadata could be rendered. Though redundant, any bit of extra context helps users navigate our complex infrastructure. At least that is our experience so far. Fields in that config table might be: schema_name | metadata_url | dataset_fulltitle | dataset_description


While I prefer B, I just might be stuck in old thought-patterns. You suggested an external service to tie all that information together. Could you eloborate on this, outlining the implementation and potential advantages?

@AntonHardock
Copy link
Author

AntonHardock commented Jul 21, 2022

Another line of thought: Suppose the 1 schema : 1 dataset pattern can't be enforced, for whatever reason. The most flexible approach then would be to let administrators define what constitutes a single dataset. Doing so through config tables might be a reasonable solution. Using the optional "mutli-dataset mode", admins could start filling the following table:
dataset_shorttitle | metadata_url | dataset_fulltitle | dataset_description

An overview of available datasets is then provided by: oaf_baseurl/datasets/
For increased readability, this page could list full dataset titles (if present, else shorttitles).
One dataset might be named schools (shorttitle). The corresponding OAF landingpage becomes: oaf_baseurl/datasets/schools/

Collections belonging to a dataset would be rendered from a table like this:
dataset_shorttitle | collection_shorttitle | collection_fulltitle | data_source (schema.table)
Here's the example "schools" with collections stored across multiple schemas:
schools | middle_schools | Middle Schools | middleschools.tableA
schools | high_schools | High Schools | highschools.tableA
E.g. the URL leading to collection "High Schools" becomes:
oaf_baseurl/datasets/schools/high_schools

Added benefit: Schema names remain hidden.
(No concern in our setup, but it might be important in other contexts)

@dr-jts
Copy link
Collaborator

dr-jts commented Jul 22, 2022

Each bullet point is supposed to represent an option towards the same goal.

Got it, that makes sense now.

The first suggestion (schema/collections/collname) is perfectly sufficient. In fact, I'd consider it the best option as it suits the (almost) zero-configuration nature of pg_featureserv.

Excellent, and agreed that zero-configuration is what we are aiming for.

You suggested an external service to tie all that information together. Could you eloborate on this, outlining the implementation and potential advantages?

It's possible to implement another service which provides the front-end to pg_featureserv queries. If there is a separate metadata repository it could be populated with links to the queries. Or, since pg_featureserv is simply serving the database catalog, the external service could access the same catalog.

Another option which is supported is to customize the pg_featureserv HTML templates. This would allow adding a link back to the metadata service (as you propose). With some web scripting it should be possible to inject any desired additional information and UI into the web pages served from the templates.

@AntonHardock
Copy link
Author

AntonHardock commented Aug 29, 2022

In the past weeks, my colleagues and I further discussed our deployment strategy. We also had a chance to speak to Clemens Portele. He confirmed that as of now, the OGC-API family centers around individual datasets. This also means that all resources related to a given dataset (features, tiles, styles, and even metadata records) shall be available from one and the same dataset endpoint. When implementing those resources as isolated microservices, a decoupled frontend service appears to be the most adequate solution.

For "traditional" deplyoment though, I believe that an optional "multi-dataset mode" would be of great benefit. Same goes for metadata links, as recommended by the Spec (Rec 9, "describedBy"). This refers to the collections level, but can easily be extended to the datasets level. At least for the federal geodata providers in Germany, both requirements are very important. Having them "out the box" could greatly facilitate the adoption of pg_featureserv (and OAF in general).

Anyway, thank you so much for the thorough discussion, that was incredibly helpful! One more thing: Are there any current plans for a new pg_featureserv version? Maybe an outline of open issues / requests that are considered for the next release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design suggestion
Projects
None yet
Development

No branches or pull requests

2 participants