Replies: 2 comments
-
@pmbrull, when we talk about metadata graph, the relationships go beyond lineage/provenance. There are other kinds of relationships such as a dashboard contains a chart, a user belongs to a team along with the lineage relationship, a data asset is upstream of another data asset. These relationships are captured in EntityReference is just a pointer to another entity. Such a pointer is used for capturing all relationships (1:1, 1:n) in the schemas. One such relationship is lineage.
Additionally, I think we should look into where stronger relationship an entity contains another (database contains tables, and a table belongs to single database) and weaker relationships a dashboard refers to a chart (because multiple dashboards can refer to the same chart) or a user is a member of a team (and user can be a member of many different teams). |
Beta Was this translation helpful? Give feedback.
-
Thanks, @sureshms for pointing me to the class! Really helpful to get all the possible relationships in writing. I get your point about strong vs. weak relationships. Should we then have a new enum to differentiate those? I could imagine something like:
IDK if the differentiation in terms of information completeness of the Entity and Entities acting as logical aggregators would still be too subjective. |
Beta Was this translation helpful? Give feedback.
-
Hi @sureshms, @harshach,
Thanks again for pushing this project, having a lot of fun with this. Only the theory-crafting part is already interesting to do :D
One question that popped into my mind is related to the integrations of the
EntityReference
andLineage
, as per my view on this it seems that their meaning might overlap in some situations.For example, having a
DashboardService
as a source for multipleDashboards
whom at the same time aggregate differentCharts
: All these relationships areEntityReferences
as per our definition of the Entities, where we require to have these ingredients for the Entity to be complete.A different scenario would be a
Pipeline
. With theLineage
we will be able to specify that it has a series ofSources
andSinks
, which will be at the same time otherTable
Entities (for simplicity).What would be the relationship definition of:
How is the relationship Pipeline vs. Table different than Dashboard vs. Chart? Or Model vs. Chart? I think it would be valuable to come up with a closed definition to help us evolve the Entities in a sustainable way.
Aside from the philosophical discussion, IMO it would make sense to update how the API works with the
EntityReference
to automatically draw theLineage
of those entities. This way it would be easier to have an auto-generated knowledge graph of our assets for a big chunk of the already present Entities.Happy to discuss! 🌻
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions