Please submit pointers to your validator output formats #1

yarikoptic · 2024-10-03T15:11:49Z

Please checkout README in https://github.com/con/validation for the motivations etc.

BIDS (attn @effigies)
NWB (attn @rly; @bendichter for nwb-inspector)
HED (attn @VisLab)
SPARC's SDS (attn @tgbugs)
...

rly · 2024-10-03T17:00:24Z

The NWB validator can be called from the command line and as a Python function.

The CLI is described here. When validation passes, the output looks like:

Validating /Users/rly/Documents/NWB/scratch/test.nwb against cached namespace information using namespace 'core'.
 - no errors found.

with exit code 0.

When it fails, it looks like:

Validating /Users/rly/Documents/NWB/scratch/test.nwb against cached namespace information using namespace 'core'.
 - found the following errors:
SpatialSeries/data (processing/behavior/position/series_0/data): incorrect shape - expected '[[None], [None, 1], [None, 2], [None, 3]]', got '(27206, 6)'

with exit code 1.

The Python function is described here. Using the paths keyword argument returns a tuple (list of errors, exit code). Using the io keyword argument returns a list of errors.

yarikoptic · 2024-10-03T17:41:47Z

FWIW, here is where we are mapping pynwb.validate output into our ValidationResult: https://github.com/dandi/dandi-cli/blob/HEAD/dandi/pynwb_utils.py#L359 so it looks like

                ValidationResult(
                    origin=ValidationOrigin(
                        name="pynwb",
                        version=pynwb.__version__,
                    ),
                    severity=Severity.WARNING,
                    id=f"pywnb.{error_output}",
                    scope=Scope.FILE,
                    path=Path(path),
                    message="Failed to validate.",
                )

but that is quite suboptimal since there is only error_output string without better decomposition into the "path" within the file etc. For nwb inspector outputs we seems to get some path within asset (file) and map it as well: https://github.com/dandi/dandi-cli/blob/HEAD/dandi/files/bases.py#L546

rly · 2024-10-04T01:52:45Z

Thanks @yarikoptic . The pynwb validator errors are parseable. I tried to address this in dandi/dandi-cli#1513

VisLab · 2024-10-04T15:03:17Z

The HED Python validator represents each issue as a dictionary and the error list is a list of these dictionaries. The error code is used to select a template for composing the error message from the dictionary. Most of our calling functions then get a printable issue string for these. The issue list could trivially be output as JSON but it maximizing usefulness would take some thought.

I guess the first step for us would be to define a JSON schema for the format of the issue list (which we currently don't have). We have a lot of fields, some of which correspond to yours and some don't.

yarikoptic · 2024-10-04T18:54:57Z

Thank you @VisLab , of special interest would be those which don't ATM have anything corresponding. Would you be so kind to create a list/table of them?

VisLab · 2024-10-04T21:45:08Z

I went over the hed-python code. There are single issue and context keys. The actual issue objects are nested a tree structure so that the context that applies to multiple issues can be easily printed once. This tree could be flattened.

Single issue keys

Key	Description
code	External HED error code from the specification-- to be mapped to web-page explanation
message	Formatted message after parameters are filled in Ex:'Invalid character "x08" at index 7'
severity	Numerical code 1 == error 10 is warning
index_in_tag	Position of start of problem in tag
index_in_tag_end	Position of end of problem in tag
source_tag	Pointer to an object with a lot of info including link to object in schema if available

Context keys:

Key	Description
ec_title	overall title for error report
ec_filename	file name this error applies to
ec_sidecarColumnName	sidecar column name
ec_sidecarKeyName	name of a categorical value
ec_row	row number in a tsv file
e_column	column number in tsv file
ec_line	line number in the file for which error is reported
ec_HedString	HED String in which tag appears
ec_section	Section of the HED schema in which error appears (for Schema validation)
ec_schema_tag	Schema tag in which error appears (for Schema validation)
ec_attribute	Schema attribute for which error appears (For schema validation) }

Sample tree structure of the issues:


{'children': [], 
('ec_sidecarColumnName', 'defs'): {
    'children': [], 
	('ec_sidecarKeyName', 'def1'): {
	    'children': [], 
		('ec_HedString', '(Definition/Apple, Definition/Banana, (Blue))'): {
		    'children': [
				{  'code': 'DEFINITION_INVALID', 
				   'message': "Too many tags found in definition for Apple.  Expected 1, found: ['Definition/Banana']", 
				   'severity': 1, 
				   'ec_sidecarColumnName': 'defs', 
				   'ec_sidecarKeyName': 'def1', 
				   'ec_HedString': <hed.models.hed_string.HedString object at 0x000002AFFF59B2E0>
				}, 
				{'code': 'TAG_GROUP_ERROR', 
				    'message': "Multiple top level tags found in a single group.  First one found: Definition/Apple. Remainder:['Definition/Banana']  Problem spans string indexes: 1, 17", 
					'severity': 1, 
					'source_tag': <hed.models.hed_tag.HedTag object at 0x000002AFFF4A2520>, 
					'ec_filename': '', 
					'ec_sidecarColumnName': 'defs', 
					'ec_sidecarKeyName': 'def1', 
					'ec_HedString': <hed.models.hed_string.HedString object at 0x000002AFFF54B7F0>, 
					'char_index': 1, 'char_index_end': 17
					}
				]
				}
				}, ...

tgbugs · 2024-10-07T15:06:32Z

The sparc validator is a bit weird in that except in cases where something breaks due to a bug in the validator itself we return the export as is or remove the malformed data (with a note in the validator that it has been removed). The validator output is embedded as errors sections within objects and a summary is lifted out, however that is not currently described in the linked schema below because we don't validate that part of the export, so I will updated the schema so that it is at least visible.

https://github.com/SciCrunch/sparc-curation/blob/master/sparcur/schemas.py

    "path_error_report": {
      "#/inputs/dataset_description_file": {
        "error_count": 6,
        "messages": [
          "'description' is a required property",
          "'name' is a required property",
          "'protocol_url_or_doi' is a required property"
        ]
      },
      "#/inputs/manifest_file/-1/checksums/-1": {
        "error_count": 1,
        "messages": [
          "{'type': 'checksum', ... 44 bytes later ... e3531d7671eab8911'} is not valid under any of the given schemas"
        ]
      },
      "#/inputs/submission_file/submission": {
        "error_count": 1,
        "messages": [
          "{'consortium_data_st ... 69 bytes later ... er': 'U19NS130608'} is not valid under any of the given schemas"
        ]
      },
      "#/meta/award_number": {
        "error_count": 1,
        "messages": [
          "'U19NS130608' does not match '^(OT2OD|OT3OD|U18|TR|U01)'"
        ]
      },
      "#/meta/techniques/-1": {
        "error_count": 2,
        "messages": [
          "'RNA -seq' is not a 'iri'",
          "'single-cell RNA sequencing' is not a 'iri'"
        ]
      }
    }

rly mentioned this issue Oct 4, 2024

Enhance pynwb validation parsing to report the path within the file, fix pynwb typo dandi/dandi-cli#1513

Merged

yarikoptic mentioned this issue Oct 18, 2024

Overhauling validation results to get them closer to cover different types of validators dandi/dandi-cli#1514

Draft

2 tasks

This was referenced Nov 21, 2024

Formalize concept/specification of the "BIDS Extensions" bids-standard/bids-2-devel#74

Open

Provide means (result records schema) for external validators (e.g. NWB) bids-standard/bids-validator#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please submit pointers to your validator output formats #1

Please submit pointers to your validator output formats #1

yarikoptic commented Oct 3, 2024 •

edited

Loading

rly commented Oct 3, 2024

yarikoptic commented Oct 3, 2024 •

edited

Loading

rly commented Oct 4, 2024

VisLab commented Oct 4, 2024

yarikoptic commented Oct 4, 2024

VisLab commented Oct 4, 2024

tgbugs commented Oct 7, 2024

Please submit pointers to your validator output formats #1

Please submit pointers to your validator output formats #1

Comments

yarikoptic commented Oct 3, 2024 • edited Loading

rly commented Oct 3, 2024

yarikoptic commented Oct 3, 2024 • edited Loading

rly commented Oct 4, 2024

VisLab commented Oct 4, 2024

yarikoptic commented Oct 4, 2024

VisLab commented Oct 4, 2024

tgbugs commented Oct 7, 2024

yarikoptic commented Oct 3, 2024 •

edited

Loading

yarikoptic commented Oct 3, 2024 •

edited

Loading