add more processing:expression examples for python, docker and generic URI #33

fmigneault · 2024-04-04T18:08:34Z

fixes #31

…c URI (fixes stac-extensions#31)

fmigneault · 2024-04-04T18:09:25Z

@emmanuelmathot @m-mohr FYI

m-mohr

URI and Docker should be provided as links with the corresponding relation type.

How would the Python type be resolved?

fmigneault · 2024-04-05T02:13:44Z

For python, text/x-python seems like a common occurrence to represent the media-type, although it is not really an official type.

I could see some combination like the following to be valid:

{
  "links": [
    {
      "href": "<uri-to-some-python-script>",
      "type": "text/x-python",
      "rel": "processing-expression",
      "processing:expression": {
        "format": "python",
        "expresion": "module:func"
      }
    }
  ]
}

I think it is valid to have either or both the links/field combinations.

One big issue about links though is that it is not useful for multiple assets.
Imagine for example that multiple assets are generated with different expressions. How do we map them properly to their respective links? It is preferable to have the expression available under the Item properties if it applies to all assets, or directly under respective assets for distinct combinations of expressions.

I don't see why there should be a distinction between gdal-calc vs python, docker, cwl or any other approach. In a similar fashion, an example corresponding to rel: processing-execution (OGC or openEO) might be relevant as well, considering that openeo is already listed as one possible processing:expression combination.

m-mohr · 2024-04-05T08:42:33Z

~~Actually, processing: field are only defined for / allowed in Item Properties and Collection Providers per the Readme:~~

~~Item Properties and Collection Provider Fields~~

~~The schema on the other hand, doesn't really reflect that. That's why links made sense to me due to the 1:1 mapping. So that's something to be discussed.~~

Edit: Updated in #32

Sorry for being ambiguos with regards to the "Python type", I meant how would I know to handle the module:func without having a reference to a script, especially as processing fields are not allowed in Links yet and in this case there's also no ambiguity in the schema.

The difference between the property and the link relation is just whether you can embed it into the JSON or not. openEO is JSON, can be embdedded. gdal-calc/rio-calc is a string, can be embdedded. CWL YAML need to be linked to (unless there's a JSON variant?). I guess you could embed code as a string, but that feels less useful than just linking to a folder or script. The format in the Expression Object is pretty much the equivivalent to the type in the links. So they are pretty much equivalent.

Conclusion: It feels like the main issue here is not updating the list of formats, but rather whether the processing:expression is allowed in more places compared to what the Readme tells us...

fmigneault · 2024-04-05T15:02:59Z

I meant how would I know to handle the module:func without having a reference to a script

I was using a similar consideration to gdal-calc that assumes the package is preinstalled in the environment. Ideally, it should be a package that can be resolved the same way as pip install [the-package]. There could be a processing:software entry to add more explicit URI to some package registry/version if that is necessary.

CWL [...] (unless there's a JSON variant?)

Yes. YAML and JSON can be used interchangeably for CWL. Examples use YAML for readability and comments insertion, but JSON are part of its spec. They have (approved IANA) media-types as well: application/cwl, application/cwl+json.

It feels like the main issue here is not updating the list of formats, but rather whether the processing:expression is allowed in more places compared to what the Readme tells us...

I agree. I think supporting processing:expression in other places could be very useful because it allows us to provide the specific context at the right place where the processing occurred rather than leaving it up to interpretation.

emmanuelmathot

I agree with putting more examples as provided as long as it stays in the processing extension at item level.
@fmigneault I think you want to use processing for a larger use case that it was designed for (lineage tracing). I reckon the need for a better handling of processing in general but I am not sure that the processing extension is the right place for that.

fmigneault · 2024-04-29T16:50:53Z

I don't understand why processing should be limited to Item level?
If distinct Assets are provided for respective bands, and that those bands are produced using a different combination of parameters with some processing function, then the lineage of the band data would be best described by the specific processing:expression under that Asset.

emmanuelmathot · 2024-04-29T19:51:45Z

yes agree for the asset but the request was at link level.

m-mohr · 2024-05-08T10:42:49Z

What's the status of this @fmigneault? I didn't quite get the last comments.

My general thinking is still that for provenance purposes you can just link to Python scripts and Docker images using the relation type processing-expression (admittedly, the name is a bit weird).
So maybe this doesn't need a spec changes and only either a bit of description and/or examples?

The processing:expression field was for embeddable strings/objects, but you seem to point to external bits so links seem to be the right choice.

fmigneault · 2024-05-08T14:33:50Z

A link with processing-expression is not clear enough when the processing applies only to a specific asset.

There is also the case of docker that, technically, is not a URL (no protocol). Therefore, it cannot be used in links. The server and schema will sometime "accept" it, thinking it is a relative path (form org/image:tag), but an invalid https://<api-base.url>/org/image:tag will be resolved by the API, which will make it invalid for docker pull (see stac-extensions/ml-model#9).

The specification does not need any change. Users are still freely allowed to do what I am proposing. So, my recommendation would be to include the examples to provide guidance, inspiration, and hopefully align methodologies.

m-mohr · 2024-05-08T16:25:17Z

I see the point about assets, but on the other hand it would also encourage using these examples in the Item properties although they should be in the links. So I'm not convinved, but also don't have a direct alternative for assets (other than adding a links arrays). I have similar issues in the CEOS ARD extension where the connection between assets and links gets somewhat lost. So maybe this is a more generic question for STAC spec: How to assign additional links to assets?

There are three examples in the PR:

URI - this can clearly be expressed in a link
Docker - I still think this should potentially also be a link. This ways it can be resolved and for docker pull you indeed need to parse and extract the protocol. But isn't this good anyway? If it's http it directly tells that it needs to handle it as insecure registry. I wouldn't bind this directly to the docker pull implementation.
Python - How's that meant to be resolved? my_package.my_module:MyClass.my_method Are clients meant to extract my_package from the string, run pip install my_package and then run my_package.my_module:MyClass.my_method somehow without parameters?

I lean towards releasing 1.1 for now and then merge this later once we are more clear on the path forward.

fmigneault · 2024-05-09T15:32:42Z

I find links to be relevant for the full Item for navigation or cross-reference purpose with other STAC documents, but describing specific behaviors within the Item or Asset with them always raises some confusing resolution via some intermediate property, such as matching the Asset key with some other property provided in the link. It is just clearer and easier to interpret the source of the processing operation on an Asset when it is directly embedded in the Asset instead of links, since it comes along with the full definition of the Asset.

For example, if a URI to an external Python script was provided in links, the user retrieving the Asset details with pystac-client or any other utility does not immediately get an indication about that script. They have to make sure to look at the other links source, and resolve (somehow) if a given link applies for their specific Asset. It is unintuitive.

For Docker, the issue is that the HTTP reference does not map to the specific URI reference of the registry/image:tag. A registry could have a certain endpoint for the HTTP access (with some HTML rendering), but the URI only tells docker the necessary information to perform relevant API requests with a compliant registry. How to resolve the endpoints is not trivial, and just adds an unnecessary burden on the developer on how to resolve it correctly and handle edge cases. Using an HTTP link could also cause insecure requests to be sent by web scrapers, while the URI is just a plain string that does not resolve to any specific endpoint directly.

For Python, the proposed format is the same used by many libraries to reference a function, such as pytest does or any WSGI/ASGI pointing at an app. This does not mean that the function is "runnable" by itself, just like a docker is not necessarily runnable by itself without parameters either.

Whether the processing expression is executable (or how to do so) is not a concern for its STAC representation. Note that examples already provided in the README have exactly the same concern. An openeo process does not indicate which/how inputs are mapped or which outputs produced which Asset. A gdal-calc or rio-calc assume that those utilities are installed, and that bands are "somehhow" mapped to their A/B/etc. or b1/b4 assets. The processing:expression remains an indication without any promise it can be executed directly, but should at least use an expression representation that looks familiar by users of the proposed tool, so they know easily what to do with it.

m-mohr · 2024-05-17T11:09:47Z

@fmigneault This is probably of interest for you: radiantearth/stac-spec#1284

fmigneault · 2024-05-17T14:16:26Z

Links per asset is a nice new feature, but I believe the current examples are still relevant and better described using an embedded processing:expression within an asset.

fmigneault added 2 commits April 4, 2024 14:05

add more processing:expression examples for python, docker and generi…

01e2b31

…c URI (fixes stac-extensions#31)

update changes

4c5a515

fmigneault mentioned this pull request Apr 4, 2024

Roadmap for V2 of the ML Model Extension crim-ca/dlm-extension#7

Closed

20 tasks

m-mohr requested changes Apr 4, 2024

View reviewed changes

This was referenced Apr 23, 2024

Roadmap for V2 of the ML Model Extension crim-ca/mlm-extension#4

Closed

Add scheme to Docker URIs stac-extensions/ml-model#9

Open

Proposal for version 2 stac-extensions/storage#24

Merged

emmanuelmathot reviewed Apr 29, 2024

View reviewed changes

m-mohr added this to the 1.1.0 milestone May 8, 2024

m-mohr removed this from the 1.1.0 milestone May 8, 2024

m-mohr mentioned this pull request May 17, 2024

Links per asset radiantearth/stac-spec#1284

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more processing:expression examples for python, docker and generic URI #33

add more processing:expression examples for python, docker and generic URI #33

fmigneault commented Apr 4, 2024

fmigneault commented Apr 4, 2024

m-mohr left a comment •

edited

Loading

fmigneault commented Apr 5, 2024

m-mohr commented Apr 5, 2024 •

edited

Loading

fmigneault commented Apr 5, 2024

emmanuelmathot left a comment

fmigneault commented Apr 29, 2024

emmanuelmathot commented Apr 29, 2024

m-mohr commented May 8, 2024 •

edited

Loading

fmigneault commented May 8, 2024

m-mohr commented May 8, 2024 •

edited

Loading

fmigneault commented May 9, 2024

m-mohr commented May 17, 2024

fmigneault commented May 17, 2024

add more processing:expression examples for python, docker and generic URI #33

Are you sure you want to change the base?

add more processing:expression examples for python, docker and generic URI #33

Conversation

fmigneault commented Apr 4, 2024

fmigneault commented Apr 4, 2024

m-mohr left a comment • edited Loading

Choose a reason for hiding this comment

fmigneault commented Apr 5, 2024

m-mohr commented Apr 5, 2024 • edited Loading

fmigneault commented Apr 5, 2024

emmanuelmathot left a comment

Choose a reason for hiding this comment

fmigneault commented Apr 29, 2024

emmanuelmathot commented Apr 29, 2024

m-mohr commented May 8, 2024 • edited Loading

fmigneault commented May 8, 2024

m-mohr commented May 8, 2024 • edited Loading

fmigneault commented May 9, 2024

m-mohr commented May 17, 2024

fmigneault commented May 17, 2024

m-mohr left a comment •

edited

Loading

m-mohr commented Apr 5, 2024 •

edited

Loading

m-mohr commented May 8, 2024 •

edited

Loading

m-mohr commented May 8, 2024 •

edited

Loading