-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The table 'dataset_keyword_tables' is too large, resulting in insufficient server storage #8399
Comments
Hey @starweavingdream! I'm here to help you with any bugs, questions, or contributions you have for the project. Let me know how I can assist you! The To clean up the Here is an example of how you might perform such a cleanup operation: from extensions.ext_database import db
from models.dataset import Dataset, DatasetKeywordTable
def cleanup_dataset_keyword_tables():
# Find all dataset IDs that are still in use
active_dataset_ids = {dataset.id for dataset in Dataset.query.all()}
# Find all entries in the dataset_keyword_tables
all_keyword_tables = DatasetKeywordTable.query.all()
# Delete entries that are not associated with active datasets
for keyword_table in all_keyword_tables:
if keyword_table.dataset_id not in active_dataset_ids:
db.session.delete(keyword_table)
# Commit the changes to the database
db.session.commit()
# Call the cleanup function
cleanup_dataset_keyword_tables() Additionally, Dify has an automated cleanup mechanism in the form of a scheduled task named For more details on the structure and indexing of the table, you can refer to the initial database setup in the Dify platform [2][3]. If you need to limit the size of the 'dataset_keyword_tables' table, you can use the |
Same problem. Did you find any solution? I found that this table is only used by keyword search. When indexing document, this table will slow down the process. |
@Scallions I tried the solution given by the robot, and there were no particularly obvious problems after I cleared the data. If you want to try my method, I suggest you back up your data first. |
@starweavingdream I need you! I'm going nuts! |
same problems. .... |
Self Checks
Dify version
0.8.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Hello, I am using a PG database and there is a table named 'dataset_keyword_tables' in the database with a large amount of data; Through testing, uploading a file of about 1M, the size increment of this table can reach 300M, which is strange. Now my table has 300 GB, and I want to know what specific use this table has? Can I clean it up? Otherwise, as more and more knowledge base documents are uploaded, this table will become larger and larger.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
The text was updated successfully, but these errors were encountered: