Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeWeightedVectorStoreRetriever does not support Chroma due to datetime metadata issue #29306

Open
5 tasks done
ksmooi opened this issue Jan 20, 2025 · 0 comments
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module

Comments

@ksmooi
Copy link

ksmooi commented Jan 20, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This is an example from LangChain (but using Chroma instead of Faiss)
link

from datetime import datetime, timedelta
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Low decay rate example
# Define your embedding model
embeddings_model = OpenAIEmbeddings()

# Initialize the Chroma vectorstore
vectorstore = Chroma(embedding_function=embeddings_model)
retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1
)

# Add documents with a timestamp from yesterday
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
    [Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])

# Retrieve documents
print("Low Decay Rate Results:")
print(retriever.invoke("hello world"))

# High decay rate example
# Reinitialize the Chroma vectorstore
vectorstore = Chroma(embedding_function=embeddings_model)
retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=vectorstore, decay_rate=0.999, k=1
)

# Add documents with a timestamp from yesterday
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
    [Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])

# Retrieve documents
print("\nHigh Decay Rate Results:")
print(retriever.invoke("hello world"))

# Virtual time example
from langchain_core.utils import mock_now

# Mock the current time to tomorrow
tomorrow = datetime.now() + timedelta(days=1)

with mock_now(tomorrow):
    print("\nVirtual Time Results:")
    print(retriever.invoke("hello world"))

Error Message and Stack Trace (if applicable)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in <cell line: 14>()
12 # Add documents with a timestamp from yesterday
13 yesterday = datetime.now() - timedelta(days=1)
---> 14 retriever.add_documents(
15 [Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
16 )

/usr/local/lib/python3.10/dist-packages/langchain/retrievers/time_weighted_retriever.py in add_documents(self, documents, **kwargs)
162 doc.metadata["buffer_idx"] = len(self.memory_stream) + i
163 self.memory_stream.extend(dup_docs)
--> 164 return self.vectorstore.add_documents(dup_docs, **kwargs)
165
166 async def aadd_documents(

/usr/local/lib/python3.10/dist-packages/langchain_core/vectorstores/base.py in add_documents(self, documents, **kwargs)
285 texts = [doc.page_content for doc in documents]
286 metadatas = [doc.metadata for doc in documents]
--> 287 return self.add_texts(texts, metadatas, **kwargs)
288 msg = (
289 f"add_documents and add_texts has not been implemented "

/usr/local/lib/python3.10/dist-packages/langchain_chroma/vectorstores.py in add_texts(self, texts, metadatas, ids, **kwargs)
564 "langchain_community.vectorstores.utils.filter_complex_metadata."
565 )
--> 566 raise ValueError(e.args[0] + "\n\n" + msg)
567 else:
568 raise e

ValueError: Expected metadata value to be a str, int, float or bool, got 2025-01-19 06:02:08.017004 which is a datetime in upsert.

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.

Description

Description

When using TimeWeightedVectorStoreRetriever with the Chroma vector store, an error occurs when attempting to add documents with datetime metadata. The error indicates that Chroma does not support datetime objects in metadata, which is required by TimeWeightedVectorStoreRetriever for its time-weighted retrieval functionality.

Expected Behavior

The TimeWeightedVectorStoreRetriever should successfully add documents to the Chroma vector store, even with datetime metadata, and allow time-weighted retrieval.

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Sun Nov 10 10:07:59 UTC 2024
Python Version: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]

Package Information

langchain_core: 0.3.30
langchain: 0.3.14
langchain_community: 0.3.14
langsmith: 0.2.3
langchain_chroma: 0.2.0
langchain_experimental: 0.3.4
langchain_openai: 0.3.1
langchain_text_splitters: 0.3.3

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.10
async-timeout: 4.0.3
chromadb: 0.6.3
dataclasses-json: 0.6.7
fastapi: 0.115.6
httpx: 0.28.1
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 1.26.4
openai: 1.59.8
orjson: 3.10.12
packaging: 24.2
pydantic: 2.10.3
pydantic-settings: 2.7.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.36
tenacity: 9.0.0
tiktoken: 0.8.0
typing-extensions: 4.12.2

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 20, 2025
@ksmooi ksmooi changed the title Bug Report: TimeWeightedVectorStoreRetriever does not support Chroma due to datetime metadata issue TimeWeightedVectorStoreRetriever does not support Chroma due to datetime metadata issue Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant