You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to deploy a basic RAG chatbot using the rag-app-text-chatbot.yaml file, but I'm running into issues with the chain-server container crashing shortly after startup. I believe I've properly followed the directions on https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html. I'm using the v0.6.0 tag on the github repository. If I run docker logs on the chain-server container, here's the output I see:
===
INFO: Started server process [1]
INFO: Waiting for application startup.
/usr/local/lib/python3.10/dist-packages/langchain/embeddings/__init__.py:29: LangChainDeprecationWarning: Importing embeddings from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:
`from langchain_community.embeddings import HuggingFaceEmbeddings`.
To install langchain-community run `pip install -U langchain-community`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:
`from langchain_community.vectorstores import FAISS`.
To install langchain-community run `pip install -U langchain-community`.
warnings.warn(
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/service_pb2_grpc.py:21: RuntimeWarning: The grpc package installed is at version 1.60.0, but the generated code in grpc_service_pb2_grpc.py depends on grpcio>=1.64.0. Please upgrade your grpc module to grpcio>=1.64.0 or downgrade your generated code using grpcio-tools<=1.60.0. This warning will become an error in 1.65.0, scheduled for release on June 25, 2024.
warnings.warn(
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
INFO:RetrievalAugmentedGeneration.common.utils:Using huggingface as model engine and WhereIsAI/UAE-Large-V1 and model for embeddings
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: WhereIsAI/UAE-Large-V1
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
INFO:RetrievalAugmentedGeneration.common.utils:Using triton-trt-llm as model engine for llm. Model name: ensemble
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 734, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 610, in __aenter__
await self._router.startup()
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 713, in startup
handler()
File "/opt/RetrievalAugmentedGeneration/common/server.py", line 158, in import_example
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/opt/RetrievalAugmentedGeneration/example/chains.py", line 56, in <module>
set_service_context()
File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
return func(*args_hashable, **kwargs_hashable)
File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 138, in set_service_context
llm = LangChainLLM(get_llm(**kwargs))
File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
return func(*args_hashable, **kwargs_hashable)
File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 270, in get_llm
trtllm = TensorRTLLM( # type: ignore
File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
super().__init__(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for TensorRTLLM
__root__
Channel.unary_unary() got an unexpected keyword argument '_registered_method' (type=type_error)
ERROR: Application startup failed. Exiting.
Exception ignored in: <function InferenceServerClient.__del__ at 0x7561548a9750>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in __del__
self.close()
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 264, in close
self.stop_stream()
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1811, in stop_stream
if self._stream is not None:
AttributeError: 'InferenceServerClient' object has no attribute '_stream'
===
Here's some docker ps output:
$ docker ps -a --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
f025cd96cc5c milvus-standalone Up 13 minutes
ca017bfe8648 milvus-etcd Up 13 minutes (healthy)
b44caa6c6e9a milvus-minio Up 13 minutes (healthy)
4b812c48035b rag-playground Up 13 minutes
a686d2b3938f chain-server Exited (3) 13 minutes ago
7fe575e94855 llm-inference-server Up 13 minutes
80f535f5a462 notebook-server Up 13 minutes
The text was updated successfully, but these errors were encountered:
I'm trying to deploy a basic RAG chatbot using the
rag-app-text-chatbot.yaml
file, but I'm running into issues with the chain-server container crashing shortly after startup. I believe I've properly followed the directions on https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html. I'm using the v0.6.0 tag on the github repository. If I rundocker logs
on the chain-server container, here's the output I see:Here's some
docker ps
output:The text was updated successfully, but these errors were encountered: