Skip to content

Commit

Permalink
doc: Document the validation model, context and inheritance principle
Browse files Browse the repository at this point in the history
  • Loading branch information
effigies committed Nov 8, 2024
1 parent dd8a5b3 commit 643ae92
Show file tree
Hide file tree
Showing 4 changed files with 263 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ deno run -A jsr:@bids/validator
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: User guide
Expand All @@ -35,6 +36,7 @@ user_guide/issues.md
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: Developer guide
Expand All @@ -43,6 +45,14 @@ dev/contributing.md
dev/environment.md
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: Concepts
validation-model/index.md
```

```{toctree}
:hidden:
:caption: Reference
Expand Down
159 changes: 159 additions & 0 deletions docs/validation-model/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# The validation context

The core structure of the validator is the `context`,
a namespace that aggregates properties of the dataset (the `dataset` variable, above)
and the current file being validated.

Its type can be described as follows:

```typescript
Context: {
// Dataset properties
dataset: {
dataset_description: object
datatypes: string[]
modalities: string[]
// Lists of subjects as discovered in different locations
subjects: {
sub_dirs: string[]
participant_id: string[]
phenotype: string[]
}
}

// Properties of the current subject
subject: {
// Lists of sessions as discovered in different locations
sessions: {
ses_dirs: string[]
session_id: string[]
phenotype: string[]
}
}

// Path properties
path: string
entities: object
datatype: string
suffix: string
extension: string
// Inferred property
modality: string

// Inheritance principle constructions
sidecar: object
associations: {
// Paths and properties of files associated with the current file
aslcontext: { path: string, n_rows: integer, volume_type: string[] }
...
}

// Content properties
size: integer

// File type-specific content properties
columns: object
gzip: object
json: object
nifti_header: object
ome: object
tiff: object
}
```

To take an example, in a minimal dataset containing only a single subject's T1-weighted image,
the context for that image might be:

```yaml
dataset:
dataset_description:
Name: "Example dataset"
BIDSVersion: "1.10.0"
DatasetType: "raw"
datatypes: ["anat"]
modalities: ["mri"]
subjects:
sub_dirs: ["sub-01"]
participant_id: null
phenotype: null

subject:
sessions: { ses_dirs: null, session_id: null, phenotype: null }

path: "/sub-01/anat/sub-01_T1w.nii.gz"
entities:
subject: "01"
datatype: "anat"
suffix: "T1w"
extension: ".nii.gz"
modality: "mri"

sidecar:
MagneticFieldStrength: 3
...
associations: {}

size: 22017017
nifti_header:
dim: 3
voxel_sizes: [1, 1, 1]
...
```

Fields from this context can be queried using object dot notation.
For example, `sidecar.MagneticFieldStrengh` has the integer value `3`,
and `entities.subject` has the string value `"01"`.
This permits the use of boolean expressions, such as
`sidecar.RepetitionTime == nifti_header.pixdim[4]`.

As the validator validates each file in turn, it constructs a new context.
The `dataset` property remains constant,
while a new `subject` property is constructed when inspecting a new subject directory,
and the remaining properties are constructed for each file, individually.

## Context definition

The validation context is largely dictated by the [schema],
and the full type generated from the schema definition can be found in
[jsr:@bids/schema/context](https://jsr.io/@bids/schema/doc/context/~/Context).

## Context construction

The construction of a validation context is where BIDS concepts are implemented.
Again, this is easiest to explain with pseudocode:

```python
def buildFileContext(dataset, file):
context = namespace()
context.dataset = dataset
context.path = file.path
context.size = file.size

fileParts = parsePath(file.path)
context.entities = fileParts.entities
context.datatype = fileParts.datatype
context.suffix = fileParts.suffix
context.extension = fileParts.extension

context.subject = buildSubjectContext(dataset, context.entities.subject)

context.sidecar = loadSidecar(file)
context.associations = namespace({
association: loadAssociation(file, association)
for association in associationTypes(file)
})

if isTSV(file):
context.columns = loadColumns(file)
if isNIfTI(file):
context.nifti_header = loadNiftiHeader(file)
... # And so on

return context
```

The heavy lifting is done in `parsePath`, `loadSidecar` and `loadAssociation`.
`parsePath` is relatively simple, but `loadSidecar` and `loadAssociation`
implement the BIDS [Inheritance Principle].

[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle
31 changes: 31 additions & 0 deletions docs/validation-model/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Validation model

The basic process of the BIDS validator operates according to the following
[Python]-like pseudocode:

```python
def validate(directory):
fileTree = loadFileTree(directory)
dataset = buildDatasetContext(fileTree)

for file in walk(dataset.fileTree):
context = buildFileContext(dataset, file)
for check in perFileChecks:
check(context)

for check in datasetChecks:
check(dataset)
```

The following sections will describe the [the validation context](context.md)
and our implementation of [the Inheritance Principle](inheritance-principle.md).

```{toctree}
:maxdepth: 1
:hidden:
context.md
inheritance-principle.md
```

[Python]: https://en.wikipedia.org/wiki/Python_(programming_language)
63 changes: 63 additions & 0 deletions docs/validation-model/inheritance-principle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# The Inheritance Principle

The [Inheritance Principle] is a core concept in BIDS.
Its original definition (edited for brevity) was:

> Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level,
> but no more than one applicable file may be defined at a given level.
> The values from the top level are inherited by all lower levels
> unless they are overridden by a file at the lower level. [...]
> There is no notion of "unsetting" a key/value pair.
Here, "top level" means dataset root, and "lower level" means closer to the data file
the metadata applies to.
More recent versions of the specification have made the language more precise at the cost
of verbosity.
The core concept remains the same.

The validator uses a "walk back" algorithm to find inherited files:

```python
def walkBack(file, extension):
fileParts = parsePath(file.path)

fileTree = file.parent
while fileTree:
for child in fileTree.children:
parts = parsePath(child.path)
if (
parts.extension == extension
and parts.suffix = fileParts.suffix
and isSubset(parts.entities, fileParts.entities)
):
yield child

fileTree = fileTree.parent
```

Using this basis, `loadSidecar` is simply:

```python
def loadSidecar(file):
sidecar = {}
for json in walkBack(file, '.json'):
# Order matters. `|` overrides the left side with the right.
# Any collisions resolve in favor of closer to the data file.
sidecar = loadJson(json) | sidecar
return sidecar
```

For `loadAssociation`, only the first match is used, if found:

```python
def loadAssociation(file, association):
for associated_file in walkBack(file, getExtension(association)):
return getLoader(association)(associated_file)
```

Each association contains different metadata to extract.
Note that some associations have a different suffix from the files they associate to.
The actual implementation of `walkBack` allows overriding suffixes as well as extensions,
but it would not be instructive to show here.

[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle

0 comments on commit 643ae92

Please sign in to comment.