Skip to content

[Question]: Clarification on metadata_condition in RAGFlow: Does it filter by the original document's update/creation time? #12678

@Chinaniu

Description

@Chinaniu

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Describe your problem

Body:

Following up on the implementation of time-based filtering, I need to clarify the source and stage where the metadata_condition filter is applied.

My understanding and question are based on this code:

metadata_conditions = {
"logic": "and",
"conditions": [
{
"name": "update_time", # Which update_time?
"comparison_operator": ">=",
"value": str(seven_days_ago)
}
]
}

Core Question 1: What does "update_time" refer to?

When I filter using {"name": "update_time"}, am I filtering based on:

A) The original document's filesystem metadata (e.g., the mtime or ctime from when the file was last modified or created on the source system)?

B) The timestamp from when the document was uploaded or processed into the RAGFlow knowledge base?

C) A custom metadata field that needs to be manually extracted and assigned during document ingestion?

Core Question 2: At which stage does this filter take effect?

Does the metadata_condition directly instruct the vector retrieval engine to only consider chunks from documents meeting the time criteria? Or is it a post-retrieval filter applied after the similarity search?

Why this matters for my use case:
We are implementing a "recent documents only" feature. It’s crucial that the filter works on the document's actual business relevance date (e.g., a report's publication date or last edit date), not just the system processing time. Clarity on this point will determine if we need to pre-process our documents to inject a custom timestamp field.

Could you please point me to the relevant documentation or explain the expected behavior? Thank you!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions