Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic status #667

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

gschwind
Copy link
Collaborator

@gschwind gschwind commented Nov 9, 2022

Overview

This patch suite implement the #354 with several refactor. I did this patch to simplify and split the previous PR #662 to make review easier. I expect to submit other relevant changes but I rework them to avoid dramatic change into the user API. For instance I did extended the service=WPS API by adding request=status instead of adding a new separate endpoint as I did in PR #662

Related Issue / Discussion

#662

Additional Information

This contribution is supported by MINES ParisTech.

Contribution Agreement

(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to PyWPS. I confirm that my contributions to PyWPS will be compatible with the PyWPS license guidelines at the time of contribution.
  • I have already previously agreed to the PyWPS Contributions and Licensing Guidelines

@coveralls
Copy link

coveralls commented Nov 9, 2022

Coverage Status

coverage: 83.516% (-0.5%) from 84.061%
when pulling 667aea8 on gschwind:pr-implement-db-status-and-wpsrequest-rework
into 10dd07a on geopython:main.

@gschwind gschwind force-pushed the pr-implement-db-status-and-wpsrequest-rework branch 3 times, most recently from 4b15707 to c76bc82 Compare September 15, 2023 15:08
@cehbrecht cehbrecht marked this pull request as draft October 30, 2023 14:35
@gschwind gschwind force-pushed the pr-implement-db-status-and-wpsrequest-rework branch 2 times, most recently from 47df40e to 95edc07 Compare November 8, 2023 08:31
Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this gets integrated, I would like to have the opportunity to test it with https://github.com/crim-ca/weaver to evaluate if any refactor causes major breaking issues.

That could break multiple birdhouse stacks that rely on pywps/weaver across multiple projects.

FYI @huard @cehbrecht

Comment on lines 78 to 79
@operations("status")
def get_status(self, wps_request, uuid):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be GetStatus?
This is the name of the WPS operation: http://opengeospatial.github.io/e-learning/wps/text/operations.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @fmigneault ,

Thanks for feedbacks.

I was not aware of GetStatus, I am focused on WPS 1.0.0 spec. I used request=status to avoid complex change in code. Initially I used a separate url than the [server] url but it require more setup to ensure wsgi to expose the new url path, and need some change on how we dispatch urls within the code, thus I dropped the idea and switched to the most convenient implementation to start the discussion. I also considered to use service=something. I will be happy to get some suggestion about this url, the only requirement is that the request need the status uuid to retrieve the status within the database.

As far as I understand GetStatus, it's seems it not the same as the WPS status file required in WPS 1.0.0 thus they cannot be merged together. Nevertheless, the GetStatus request can be implemented on top of this patch suite, all data required for generating the GetStatus outputs is stored in the database in this patch suite. We can also implement the GetResult.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay.

service=... should be avoided as it is (normally) intended to redirect to other services such as WMS, WCS, etc.
Currently, PyWPS does not have other services running under the same webapp, but a server using a PyWPS behind the scene could filter this parameter to perform some redirections to other endpoints.

The idea of GetStatus is to retrieve the current status of the job, regardless of where the status is stored.
However, the full XML details of the status can always be retrieved from statusLocation provided in the execute response. This location does not necessarily have to be a file, but this is what PyWPS currently uses. That location does not have to be on the same server either. It could point at a remote instance or an S3 bucket for example.

If GetStatus is needed, I think the right approach would be to employ WPS 2.0. It would be preferable to support (relatively) newer revisions than patch older ones with unofficial requests.

Copy link
Collaborator Author

@gschwind gschwind Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, do you have some idea of which url I should use ?

I will remove the decorator @operations that can be confusion within the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a configurable setting to allow instances to adapt as needed.
Similar to how the [server] outputurl can be configured, it could be a [server] statusurl or similar.
This way, the server can decide where to host status files (or not if the setting is left empty by default).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need of file for dynamic status, thus the url does not need a mapping to a file on the file system. I will keep an option to keep the file on file system with same behaviour as before, but the url that I talk about will be provided by wsgi application directly, not through apache, in the same manner as DescribeProcess request for instance, or as we will have to implement for WPS 2.0.0, i.e. GetStatus.

Currently this url look like:

http://hostname/path/to/wps?service=WPS&request=status&uuid=SOMEUUID

where http://hosname/pat/to/wps is the setting provided by [server] url

If we want to use another unrelated path than the one above, it require that user configure a second wsgi application.

One option is to use a sub-path on top of the one above as instance it can be:

http://hostname/path/to/wps/status?uuid=SOMEUUID

or

http://hostname/path/to/wps/status/SOMEUUID

Another option is to not bind the WPS with the url option [server] url directly but having wps service binded to sub-path that way in the configuration above the wps may by bound to a path like:

http://hostname/path/to/wps/wps?service=WPS&request=GetCapabilities

having status urls and wps request url at the same level.
In this case we may have following configurations:

[server]
rooturl=http://hostname/path/to/wsgi
status_sub_path=custom_status_path
wps_sub_path=custom_wps_path

that will produce the following urls for WPS request:

http://hostname/path/to/wsgi/custom_wps_path?service=WPS&request=GetCapabilities

and

http://hostname/path/to/wsgi/custom_status_path?uuid=someuuid

The last options is my favorite compromise. default value for wps_sub_path can be 'wps' and the default status_sub_path can be 'status'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be kept this way, and not move it to the pywps.__init__.py.
Init is not where the status should be defined.
All the code should import from pywps.response.status import WPS_STATUS.
There is no reason to break this interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why I moved the WPS_STATUS was to solve a recursive import issue. I think it was because implicitly python load __init__.py from all directory within an import path, even if it's not required. For instance if you import a.b.c.d, python load a.__init__.py, a.b.__init__.py and a.b.c.__init__.py before loading a.b.c.d.py and if you need import a.b.c.d within a.b.__init__.py you are in trouble.

I will check it.

Best regards

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pywps.response.status only defines WPS_STATUS without importing other pywps locations, it should be possible to do from pywps.response.status import WPS_STATUS from anywhere. There's actually more chances that a circular import occurs if from pywps import WPS_STATUS is called while pywps is still being imported by some other module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault

I get this error.txt if I do not move WPS_STATUS.

I will check closer if I can find a better change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is because of the relative imports added to pywps/response/__init__.py.
There is no need to import them there as shortcuts.
Anywhere that needs to import these classes can simply use the complete module path.
Avoid using .. and . imports. Always do explicit from pywps....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault Ok thanks I will try it :)


from werkzeug.wrappers import Request, Response
from pywps.app.WPSExecuteRequest import WPSExecuteRequest
from ..response import CapabilitiesResponse, DescribeResponse, ExecuteRawResponse, StatusResponse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use full pywps import path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will update the code to follow this policy

Comment on lines 165 to 180
def _setup_status_storage(self):
self._status_store = StorageBuilder.buildStorage()

@property
def status_store(self):
if self._status_store is None:
self._setup_status_storage()
return self._status_store

@property
def status_location(self):
return self.status_store.location(self.status_filename)

@property
def status_filename(self):
return str(self.uuid) + '.xml'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to keep this file available locally (on top of allowing request=GetStatus)?
Can't the handlers be left available?

The OGC API - Processes package I work on that uses PyWPS for corresponding WPS operations expects the XML files to be made available to users to help them debug why their process failed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @fmigneault ,

The file can be retrieved with a quite easy GET request from uuid and the json data for generating the status is also stored within the database, thus I think this would be duplicated. I can implement it, but I would prefer that it is not implemented until the following comments get addresses.

Another thing is that currently the data storage within PyWPS seems to be a mess. The pywps.inout.storage does not seems to be used consistently across the current code. As far as remember the FileStorage must be used to ensure that the status file get a valid output url i.e. [server] outputurl configuration. And the FileStorage setup must be made in way that the path of stored file it the one exposed by [server] outputurl, thus I tryed to avoid that, that why I put the whole data within the database, avoiding using FileStorage.

The general idea here is that I would like to remove completly [server] outputurl and I would like every request goes to one entry point. that entry point will map request using the proper backend Storage facility. That way the storage facility and the way of the data are exposed is independent, and users are free to store data in the way they prefer by implementing they own pywps.inout.Storage or using one provided by pywps. For instance, this entry point may be a root path with several sub-path such as 'http://host/path/wps', 'http://host/path/status' or 'http://host/path/files' as I designed in the first place. We may also use GET parameters instead as I did in the current patch suite. I droped the path idea for GET method because the later is less invasive within the code and make the the setup of the server simpler. Note that I have another patch suite where I implement this behavior to store and access results data.

Best regards

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that users could have the option to control where the status is stored and/or reported. However, to avoid breaking existing instances, it would be preferable if the default uses the local file as it was previously done. An option would have to be used to avoid producing the file. Any server that relies on this local XML file to exist locally will break.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will implement that option

@gschwind
Copy link
Collaborator Author

@fmigneault

You are welcome to test it and to provide feedbacks. I'm using it with my own WPS server :)

Best regards.

@gschwind gschwind force-pushed the pr-implement-db-status-and-wpsrequest-rework branch 2 times, most recently from de7cc67 to ae7947a Compare February 7, 2024 15:37
@gschwind
Copy link
Collaborator Author

gschwind commented Feb 7, 2024

Hello,

I did updated the patch suite to address comment above, to summarize changes:

  • I added an option to keep the legacy on-disk status file, disable by default
  • I changed the url of status to use [server] url + /status as compromise where users does not have to change configuration
  • I removed the operations decorator from the suite that provide no practical benefit
  • I removed relative import
  • I removed the move of WPS_STATUS and used an alternative import, i.e. I use late runtime import
  • Some tests are updated to simulate more accurately the wsgi environment.

I do not like to much the setup of URL but as said, it's compromise, maybe in futur we may implement a cleaner aproach.

Best regards

@gschwind gschwind force-pushed the pr-implement-db-status-and-wpsrequest-rework branch from ae7947a to 039b44b Compare February 7, 2024 15:52
@gschwind gschwind requested a review from fmigneault February 7, 2024 15:54
The WPS request contain the pre-parsed WPS data in json, i.e. an
intermediate form of the request. We can store this form for later
processing. In other hand the WPSExecuteRequest is the final WPS request
form to run the process, it is used into the handler function as
wps_request parameter. This simplify the data stored for later use. This
also avoid that serialization of pywps.inout.inputs.*Input is used in
very diferent context.
WPSExecuteRequest.inputs['id'] is collections deque, this patch convert
them to regular python list.
The ExecuteResponse is responsible to render the output XML and this is
not mandatory for running process, in particular background process.
This patch split the part that are required to run a given execute
request from the part that is required to render the execute output.
@gschwind gschwind force-pushed the pr-implement-db-status-and-wpsrequest-rework branch from 039b44b to 667aea8 Compare February 9, 2024 13:25
@gschwind
Copy link
Collaborator Author

gschwind commented Feb 9, 2024

Fix more relative imports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants