-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor STAC extensions and validation outside of populator logic #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the clear structure of the extensions.
I suspect there are are few bits of the new stuff that are not exercised by the test suite. In particular, I'll be curious to see the summaries in action. That could be another PR, just flagging it so we don't forget it.
The cmip6 extension effort is impressive and does have value in its own way. The new features regarding summaries can be retained. However, this PR makes rather substantial changes to the methodology for creating STAC items and in that regard it moves far from the original framework that Mathieu and I had come up with where our goal was to focus on making the system friendly and approachable to end-users and specifically avoid the traps that other similar projects have found them into where usability takes a back-seat to the desires of developers to do fancy stuff. I continue to believe in the need to approach stac-populator from a usability perspective and in the need to keep things simple. The complexity of the CMIP6 extension violates those beliefs; sure it makes I have long maintained my position that I would like to use this implementation, which is for UofT CMIP6 data, as an example for how other users can create representations for any type of data without needing to create extensions or using fancy pystac facilities, all of which increase the level of understanding one has to have before creating STAC representations and could constitute a barrier. Francis knows this very well, so submission of this PR without prior discussions on changes of this sort seems like a heavy-handed effort to push through changes that he is personally in favour. I also think it was implicit, following Mathieu's departure, that UofT is taking the lead on this project going forward (especially with CRIM winding down), so I find Francis's efforts to steer the project in a different direction without sufficient consultation and without deference to the plans and visions of those who started the project, to be rather bold to put it mildly. I don't think this is the way to do things. |
Indeed. @dchandan
The PR does not change at all how the populator procedure behaves. It only refactors the STAC CMIP6 extension interface to align it with all other existing STAC extensions. This will allow to eventually port it to the official https://github.com/stac-utils/pystac repo after further validation. This will greatly simplify its use by external users and makes it more approachable. I don't see how this could be made easier for users... You can now do: for extension in [ CMIP6Extension, THREDDSExtension, DatacubeExtension, <etc any extension> ]:
extension.apply(item) Instead of going through all the loops for
I think you had a narrowed view based on your use case alone.
There is no need for them to create anything if the extension implementations are actually made available in official
This is the point of this PR... to have a discussion around it.
You need to understand that DACCS project is ending on our side because we exhausted our budget. That does not mean we are disappearing from the map regarding geospatial projects. We have many other projects actively using STAC and related technologies. I am already forwarding the use of the populator to ESA and other companies. You have to think about the wider scope than DACCS / Marble only. |
I am not opposed to this CMIP6 extension for what it is. Because CMIP6 is widely used, incorporating this extension in
You're right, it doesn't strictly change how the populator behaves. But it does affect the vision of providing a tool that is easy to use and which provides a low barrier for users to adapt and extend.
You are completely missing the point here. I am not talking about someone who needs to create another CMIP6 implementation, I am talking about laying out an example or examples for future node operators to use when they want to put some other data on their node and catalog it in their STAC catalogs. I wanted to lay out a method for them to be able to create STAC objects for their data without having to explicitly create extensions. An extension only becomes useful when it is used for very commonly used data like CMIP6 (so, like I said above, in that regard your extension makes perfect sense), but there are thousands upon thousands of datasets out there, big and small, frequently used and infrequently used, and I am reluctant to suggest to the community, or set an example, that for any of those datasets "go ahead and write an extension; that is the only way to create a STAC representation". You have gone through all this effort to create CMIP6 implementation, but its usefulness is limited to those who will need to put CMIP6 data on their own machines. How many other Marble nodes will do that? Why would they even do that when UofT node already has the data? So, in this regard your effort does not help the Marble community, I can only see it being of use to non-Marble users. The more generic first-principles based approach that I was using would have been useful to other Marble users. |
There were no "loops" for converting from item = STACItem(.....) # a pydantic object. One line call to create a full pydantic object
# Convert pydantic STAC item to a PySTAC Item
item = pystac.Item(**json.loads(item.model_dump_json(by_alias=True))) Done. Now You have taken all the code from |
You should have opened an issue saying you are thinking of making this big shift, and inquiring if there enough support to do that. |
I agree that some refactoring was needed, some logic ended up too coupled. I had a plan to decouple that and this de-coupling could be easily done without the complexity you have introduced with the two new extensions. I also should say here that the
I don't know yet, because the PR doesn't work right now, so I'll have to wait until the issues are fixed to assess this statement. |
This is exactly the issue that has plagued |
I don't see the plaguing issue that you observe, and am very disappointed you feel this way about a many-year collaborative effort. I'm not sure how you plan on promoting Marble to new users and node hosts, or get future financing once DACCS budget is exhausted on your side, if the server cannot even be interoperable with others that are based on the same
Like previously mentioned, I am proposing an alignment with the The "preferred design" seems to be only your own. The community that worked on |
If extensions or core STAC properties are not used, then there is no point in publishing that data using STAC.
Wrong. With this, you lose all references to the STAC extensions you supposedly applied just before, and it gives you the false impression that your item data is valid, while it isn't. Point in hand, the results from the test from the current implementation did not generate valid STAC definitions, which I had to patch the missing Using All capabilities, methods, and data-handling from By not reusing You also seem to forget that the purpose of
It is not the first time that using
I disagree.
All this has been disregarded each time, and the simple hooks that were asked to achieve my goal are still not added to this date, although I provided explicit ways on how to implement them. My understanding is that unless I add them myself, this would never be done.
Not sure what you mean? All tests are passing since this morning. |
There seems to a misunderstanding on your side that I am proposing to move away from |
Your second comment above contains several errors and frequently falls into familiar misunderstandings. I won't go into them all point-by-point as it is simply not worth it, so I will focus on just a few issues:
That is a pretty thin argument and entirely flawed. Furthermore, it is irrelevant, because I am using core STAC properties and I am using extensions, more on this below.
I think this is another misunderstanding that keeps cropping up. I am not opposed to using STAC extensions, by which I mean the extensions defined in JSON Schemas. I know they are important, in fact in item.stac_extensions.append(
"https://raw.githubusercontent.com/TomAugspurger/cmip6/main/json-schema/schema.json"
) We are also using Datacube extension. I am also not opposed to the
Wrong. No references to STAC extensions are lost in that step because no extensions were ever defined prior to that step. All extensions are added afterwards, in
I have actually tested the user-side interaction with
|
And yet, those were both the wrong URI, and not actually applied to the output result unless going through all the steps of
You validated the fields by providing the URI explicitly. An external tool could not automatically figure out the resolution between your Note that I am not against using The main difference of the implementation provided by this PR is that
This is exactly what I was referring to when mentioning the back-and-forth between The process of converting the NCML attributes to STAC CMIP6 is one step (
You think the following is complicated? class CMIP6Extension(
PropertiesExtension,
ExtensionManagementMixin[pystac.Item],
):
@property
def name(self) -> Literal["cmip6"]:
return "cmip6"
def apply(
self,
properties: Union[CMIP6Properties, dict[str, Any]],
) -> None:
if isinstance(properties, dict):
properties = CMIP6Properties(**properties)
data_json = json.loads(properties.model_dump_json(by_alias=True))
for prop, val in data_json.items():
self._set_property(prop, val)
@classmethod
def get_schema_uri(cls) -> str:
return SCHEMA_URI
@classmethod
def ext(cls, obj: T, add_if_missing: bool = False) -> "CMIP6Extension":
if isinstance(obj, pystac.Item):
cls.ensure_has_extension(obj, add_if_missing)
return ItemCMIP6Extension(obj)
raise NotImplementedError()
class ItemCMIP6Extension(pystac.Item):
def __init__(self, item: pystac.Item):
self.item = item
self.properties = item.properties That is all that is needed to provide native support of Note again that the implementation I provide by this PR is bigger because I add additional support for |
I'll get back to this thread tomorrow. Busy with other things atm. |
@dchandan |
I think the way to proceed is to keep the CMIP6 pystac extension you have written for the CMIP6 data. The implementations that I write for other data products, I will continue to use a non-pystac extension approach (except for those where it may make sense to create an extension), and the two approaches will simply exist side-by-side. To do that I'll need to put back some of the code that was removed in this PR, but I will do that when I write those implementations, so nothing needs to be done about that here. |
@dchandan |
…on (main github branch)
@fmigneault |
Changes
pydantic
.Use all original
pystac
objects instead to let them deal with their own validation strategy.They can instead combine as many STAC extension objects as they want/need, and dispatch the resolution/validation to respective extensions.
pystac
approach.CMIP6Helper
andTHREDDSHelper
(similar toDatacubeHelper
) that are used to perform NetCDF/NCML conversion as necessary.stac_extensions
for CMIP6.Rebased onto #33
Could use #35 as well to fix test that patches
access_urls
(copied overServiceType
in the meantime).Closes #32