You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 8, 2025. It is now read-only.
Currently we store provenance information in both postgres and neo4j. This leads to redundancy and complexity. It also causes some strange behavior--for example, the timestamp a relationship is created can only be found in postgres when such a thing could easily be set on the relation in neo4j. It also means we have to retain the autogen/enums.py and other vestigial things.
Additionally, the graph relations and queries vastly out-complicate current HMI needs. This can be simplified.
Proposed state
Store all provenance information directly in neo4j and remove any lingering provenance data from postgres.
Simplified relationships
There are currently only a few things we care about provenance for:
model is related to document
dataset is related to document
document is related to document
document is related to artifact (code)
model is derived from artifact (code)
model is derived from model
dataset is derived from simulation
dataset is derived from dataset
Each relationship/edge should have a timestamp for when it was set.
NOTE: Users and projects DO NOT belong in provenance relationships at this time. That is a complication best addressed later.
Proposed queries
The HMI needs to have trivial ways to run a single search:
Find all nodes of specific type(s) that are 1 hop from a given node
We can allow greater than 1 hop in the future, but for now I do not see a use case in the HMI for >1 hop. This single query should be able to cover all immediate use cases, including:
Find all items related to a document
Find models derived from my current model
Find the associated code artifact from which my model was derived
Current state
Currently we store
provenance
information in both postgres and neo4j. This leads to redundancy and complexity. It also causes some strange behavior--for example, the timestamp a relationship is created can only be found in postgres when such a thing could easily be set on the relation in neo4j. It also means we have to retain theautogen/enums.py
and other vestigial things.Additionally, the graph relations and queries vastly out-complicate current HMI needs. This can be simplified.
Proposed state
Store all provenance information directly in neo4j and remove any lingering provenance data from postgres.
Simplified relationships
There are currently only a few things we care about provenance for:
model
is related todocument
dataset
is related todocument
document
is related todocument
document
is related toartifact
(code)model
is derived fromartifact
(code)model
is derived frommodel
dataset
is derived fromsimulation
dataset
is derived fromdataset
Each relationship/edge should have a
timestamp
for when it was set.Proposed queries
The HMI needs to have trivial ways to run a single search:
We can allow greater than 1 hop in the future, but for now I do not see a use case in the HMI for >1 hop. This single query should be able to cover all immediate use cases, including:
Assumptions
This assumes that #299 is in place.
The text was updated successfully, but these errors were encountered: