Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create validate_camtrapdp() and helper functions to validate integrity of a Camtrap DP #58

Open
8 tasks
peterdesmet opened this issue Jul 15, 2023 · 2 comments
Labels
function:validate_camtrapdp Function validate_camtrapdp()

Comments

@peterdesmet
Copy link
Member

Suggested in camtraptor July 2023 coding sprint

An important aspect before analysing or publishing data is to check whether the dataset does not contain any major integrity errors, such as missing dates, coordinates, values not meeting controlled vocabularies or relationships between tables not being correct. Although validation is possible with the Python software Frictionless Framework, for most users the returned error messages are hard to parse.

  • camtraptor (or the frictionless R package) could offer some basic data validation (easier to implement than the entire metadata and data validation Frictionless Framework offers).
  • Users can correct issues by retrieving data (#232), correct errors and updating the package (#248).
  • A user facing validate() function could make use of a number of check_ helper functions. Those helper functions could also be run by other functions, e.g. when updating data (#248).

Suggestions for functions:

  • validate(package)
  • check_relations(package): relationships are valid
  • check_identifiers(package, "table name"): IDs are unique
  • check_required(package, "table name"): required fields are populated
  • check_vocabularies(package, "table name"): values meet factor levels. Note that read_resource()/readr() converts these to factors and might throw problems()
  • check_data_types(package, "table name"): note that read_resource()/readr() will throw problems() but otherwise will do a best attempt at converting
  • check_timestamps(package, table name"): has timezone, start <= end (specific to camtraptor, not a frictionless thing)
  • check_durations(package): obs & media timestamps within deployment (specific to camtraptor not a frictionless thing)

While it would be useful if these were functions of the frictionless R package, it might not be what we expect for camtraptor. Frictionless would have its validation run on resources (i.e. csv files + schemas), since returned data frames lose the connection with their schema, so it is not possible to validate for relationships or unique, as that information is lost. Camtraptor on the other hand, wants to validate the (already read) data frames.

@damianooldoni
Copy link
Member

@peterdesmet: wondering if we should move this issue to camtrapdp repo.

@peterdesmet
Copy link
Member Author

Ah yes, will do.

@peterdesmet peterdesmet transferred this issue from inbo/camtraptor Apr 30, 2024
@peterdesmet peterdesmet added function:check Function check_camtrapdp() function:validate_camtrapdp Function validate_camtrapdp() and removed function:check Function check_camtrapdp() labels May 31, 2024
@peterdesmet peterdesmet changed the title Create validate() and helper functions to validate integrity of a Camtrap DP Create validate_camtrapdp() and helper functions to validate integrity of a Camtrap DP Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
function:validate_camtrapdp Function validate_camtrapdp()
Projects
None yet
Development

No branches or pull requests

2 participants