-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Set id property of EntityNode in PropertyGraphIndex #17548
Comments
Hey @bldev2473! I'm here to help you with your Neo4j issue. Let's work together to solve this! To set the 'id' property of each import uuid
def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
"""Add triplet with unique UUIDs for nodes."""
# Generate UUIDs if 'subj' or 'obj' are not provided
subj_id = subj if subj else str(uuid.uuid4())
obj_id = obj if obj else str(uuid.uuid4())
query = """
MERGE (n1:`%s` {id:$subj_id})
MERGE (n2:`%s` {id:$obj_id})
MERGE (n1)-[:`%s`]->(n2)
"""
prepared_statement = query % (
self.node_label,
self.node_label,
rel.replace(" ", "_").upper(),
)
with self._driver.session(database=self._database) as session:
session.run(prepared_statement, {"subj_id": subj_id, "obj_id": obj_id}) This modification ensures that each node has a unique identifier by using UUIDs, which should help you avoid the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu, Neo4jPropertyGraphStore has no upsert_triplet method. Should i modify upsert_nodes? |
Yes, you should modify the import uuid
def upsert_nodes(self, nodes: List[LabelledNode]) -> None:
entity_dicts: List[dict] = []
chunk_dicts: List[dict] = []
for item in nodes:
if isinstance(item, EntityNode):
# Generate a UUID if the id is not provided
item_id = item.id if item.id else str(uuid.uuid4())
entity_dicts.append({**item.dict(), "id": item_id})
elif isinstance(item, ChunkNode):
chunk_dicts.append({**item.dict(), "id": item.id})
else:
pass
if chunk_dicts:
for d in chunk_dicts:
self.structured_query(
"""
WITH $data AS row
MERGE (c:Chunk {id: row.id})
SET c.text = row.text
SET c += removeKeyFromMap(row.properties, '')
RETURN count(*)
""",
param_map={"data": d},
)
if entity_dicts:
for d in entity_dicts:
self.structured_query(
f"""
WITH $data AS row
MERGE (e:`{BASE_NODE_LABEL}` {{id: row.id}})
SET e += removeKeyFromMap(row.properties, '')
SET e.name = row.name, e:`{BASE_ENTITY_LABEL}`
SET e:`{d['label']}`
WITH e, row
WHERE removeKeyFromMap(row.properties, '').triplet_source_id IS NOT NULL
MERGE (c:Chunk {{id: removeKeyFromMap(row.properties, '').triplet_source_id}})
MERGE (e)<-[:MENTIONS]-(c)
RETURN count(*) as count
""",
param_map={"data": d},
) This modification ensures that each To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu, modification you suggested can change id of EntityNode. However, this could cause issues with establishing the relationships between EntityNodes. Therefore, the 'id' change should be processed within the _insert_nodes method of the PropertyGraphIndex class. |
To address the issue of setting unique IDs for
Here's a code snippet to illustrate this: import uuid
def _insert_nodes(self, nodes: Sequence[BaseNode]) -> Sequence[BaseNode]:
"""Insert nodes to the index struct with unique IDs."""
if len(nodes) == 0:
return nodes
# Run transformations on nodes to extract triplets
if self._use_async:
nodes = asyncio.run(
arun_transformations(
nodes, self._kg_extractors, show_progress=self._show_progress
)
)
else:
nodes = run_transformations(
nodes, self._kg_extractors, show_progress=self._show_progress
)
# Ensure all nodes have nodes and/or relations in metadata
assert all(
node.metadata.get(KG_NODES_KEY) is not None
or node.metadata.get(KG_RELATIONS_KEY) is not None
for node in nodes
)
kg_nodes_to_insert: List[LabelledNode] = []
kg_rels_to_insert: List[Relation] = []
for node in nodes:
# Generate a UUID if the id is not provided
node.id_ = node.id_ if node.id_ else str(uuid.uuid4())
# Remove nodes and relations from metadata
kg_nodes = node.metadata.pop(KG_NODES_KEY, [])
kg_rels = node.metadata.pop(KG_RELATIONS_KEY, [])
# Add source id to properties
for kg_node in kg_nodes:
kg_node.properties[TRIPLET_SOURCE_KEY] = node.id_
for kg_rel in kg_rels:
kg_rel.properties[TRIPLET_SOURCE_KEY] = node.id_
# Add nodes and relations to insert lists
kg_nodes_to_insert.extend(kg_nodes)
kg_rels_to_insert.extend(kg_rels)
# Filter out duplicate kg nodes
kg_node_ids = {node.id for node in kg_nodes_to_insert}
existing_kg_nodes = self.property_graph_store.get(ids=list(kg_node_ids))
existing_kg_node_ids = {node.id for node in existing_kg_nodes}
kg_nodes_to_insert = [
node for node in kg_nodes_to_insert if node.id not in existing_kg_node_ids
]
# Upsert nodes and relations
if len(kg_nodes_to_insert) > 0:
self.property_graph_store.upsert_nodes(kg_nodes_to_insert)
if len(kg_rels_to_insert) > 0:
self.property_graph_store.upsert_relations(kg_rels_to_insert)
return nodes This modification ensures that each To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu, That approach is insufficient because the source_id and target_id of the Relation also need to be updated. Where are Relations initially created within the PropertyGraphIndex? |
Relations are initially created within the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Question Validation
Question
Hi, I'm using PropertyGraphIndex and SchemaLLMPathExtractor to build a kg with Neo4j using an LLM.
After constructing the index and knowledge graph, I noticed that the 'id' and 'name' properties of the generated EntityNodes are identical. I'd like to set the 'id' of each EntityNode to a UUID or another unique key.
My requirement is that one or more entities should be generated with the same name but different values for other properties.
I tried to set the metadata of the document with "id" key, but following error occurred.
neo4j.exceptions.ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(34) already exists with label
Nodeand property
id= '142033'}
Is there a solution for this?
Thank you.
The text was updated successfully, but these errors were encountered: