Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary validation of unmodified resources in same dataset #49

Open
ThrawnCA opened this issue Mar 4, 2020 · 3 comments
Open

Unnecessary validation of unmodified resources in same dataset #49

ThrawnCA opened this issue Mar 4, 2020 · 3 comments

Comments

@ThrawnCA
Copy link
Contributor

ThrawnCA commented Mar 4, 2020

Overview

When a resource is modified, all resources in the dataset are validated, even though most of them are unmodified. On datasets with many resources, this can result in a substantial performance problem, especially if the resources are large.

It appears that the after_update function assumes that the presence of "resources" in the data dictionary means that the whole package is being updated at once. However, this is not necessarily the case. If a resource is updated via eg the resource_update API, then this code path will still trigger.


Please preserve this line to notify @amercader (lead of this repository)

@ThrawnCA
Copy link
Contributor Author

ThrawnCA commented Mar 4, 2020

Testing indicates that the package_patch API and the resource_patch API both call the after_update function with the full package dictionary, as does editing a resource via the web interface.

@ThrawnCA
Copy link
Contributor Author

ThrawnCA commented Mar 4, 2020

Ok, so resource_patch and resource_update first call after_update with the package dict, then with the resource dict. However, the first call generates validation jobs for every resource in the package, before the second takes place. There needs to be a way for after_update to detect that it was actually triggered by a resource call.

What if before_update didn't just populate resources_to_validate when a resource does need validation, but always populated it with either True or False, and then after_update checked whether or not resources_to_validate was empty? Empty -> this is a package call, validate everything. Not empty -> this is a resource-based call, only validate resources with True entries in resources_to_validate.

Alternatively, before_update could update a new dict, eg self.packages_to_skip, indicating that a call originates from a resource API and after_update should skip the package.

@ThrawnCA
Copy link
Contributor Author

ThrawnCA commented Mar 4, 2020

This is the reason that 4321b79 broke the TravisCI build; as soon as the plugin implemented IPackageController, it started incorrectly generating multiple validation jobs.

JVickery-TBS pushed a commit to JVickery-TBS/ckanext-validation that referenced this issue Jul 23, 2024
[QOLDEV-323] gracefully handle blank 'last_modified' date in validation badge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant