-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
DB: clean up tables #12732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DB: clean up tables #12732
Conversation
Instead of manually deleting a big amount of rows, use a task to delete them in chunks to avoid overloading the database. Ref readthedocs/meta#202
|
@copilot write tests for |
…ches (#12739) Adds comprehensive test coverage for batch deletion utilities to validate limit parameter functionality and raw deletion behavior. ## Changes - **`delete_in_batches` limit tests** (5 tests): Validates limit parameter correctly bounds deletions when limit is smaller, larger, equal to batch size, and edge case limit=1 - **`raw_delete_in_batches` tests** (10 tests): Validates raw deletion across empty querysets, batch size boundaries, multiple batches, and limit parameter variations All tests verify correct deletion counts and remaining object counts after batch operations. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: stsewd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors database cleanup tasks to use batch deletion instead of manual SQL queries or raw ORM operations that could overload the database. The changes introduce a new raw_delete_in_batches utility function and update existing deletion tasks to delete records in smaller chunks with configurable limits.
Changes:
- Added
raw_delete_in_batchesutility function for batch deletion without triggering signals or cascades - Enhanced
delete_in_batchesto support alimitparameter for controlling the total number of deletions - Refactored
delete_old_build_datatask to use batch deletion with limits - Added two new scheduled tasks:
delete_outdated_imported_filesanddelete_orphaned_addons_configs - Configured temporary hourly execution of cleanup tasks with limits to process backlogs
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| readthedocs/core/utils/db.py | Added raw_delete_in_batches function and enhanced delete_in_batches with limit parameter |
| readthedocs/core/tests/test_db_utils.py | Added comprehensive test coverage for the new limit parameter and raw_delete_in_batches function |
| readthedocs/telemetry/tasks.py | Refactored delete_old_build_data to support batch deletion with configurable limits |
| readthedocs/core/tasks.py | Added two new cleanup tasks for imported files and orphaned addon configs |
| readthedocs/settings/base.py | Configured periodic task schedules with hourly execution and limits for processing backlogs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
readthedocs/core/utils/db.py
Outdated
| @@ -32,8 +36,46 @@ def delete_in_batches(queryset, batch_size=50) -> tuple[int, dict]: | |||
| # We can't use a limit or offset with .delete, | |||
| # so we first extract the IDs and perform the deletion in anothr query. | |||
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the comment: "anothr" should be "another".
readthedocs/core/utils/db.py
Outdated
|
|
||
| model = queryset.model | ||
| # We can't use a limit or offset with .raw_delete, | ||
| # so we first extract the IDs and perform the deletion in anothr query. |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the comment: "anothr" should be "another".
| "options": {"queue": "web"}, | ||
| }, | ||
| # TODO: delete this task when all imported files pending deletion are done. | ||
| # It shuold take around 36 days to delete all the old imported files on community, |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the comment: "shuold" should be "should".
| # It shuold take around 36 days to delete all the old imported files on community, | |
| # It should take around 36 days to delete all the old imported files on community, |
readthedocs/core/utils/db.py
Outdated
| def raw_delete_in_batches(queryset, batch_size=50, limit: int | None = None): | ||
| """ | ||
| Raw delete a queryset in batches to avoid long transactions or big queries. | ||
|
|
||
| Similar to ``delete_in_batches``, but uses ``_raw_delete``. | ||
| This is useful when you don't want to overload the DB by deleting | ||
| a large number of records at once, since _raw_delete() doesn't support | ||
| limits/offsets. | ||
|
|
||
| .. warning:: | ||
|
|
||
| Since this uses ``_raw_delete``, it won't trigger any | ||
| signals, and won't cascade delete related objects. | ||
| Use it only if you are sure there are no related objects | ||
| that need to be deleted/updated. | ||
| """ | ||
| # Don't use batch deletion if the number of records | ||
| # is smaller or equal to the batch size. | ||
| count = queryset.count() | ||
| if count == 0: | ||
| return | ||
| if count <= batch_size: | ||
| queryset._raw_delete(queryset.db) | ||
| return | ||
|
|
||
| model = queryset.model | ||
| # We can't use a limit or offset with .raw_delete, | ||
| # so we first extract the IDs and perform the deletion in anothr query. | ||
| all_pks = queryset.values_list("pk", flat=True) | ||
| if limit: | ||
| all_pks = all_pks[:limit] | ||
| for batch in batched(all_pks, batch_size): | ||
| qs = model.objects.filter(pk__in=batch) | ||
| qs._raw_delete(qs.db) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw_delete_in_batches function doesn't return any value, unlike delete_in_batches which returns a tuple containing the total number of deleted objects and a dictionary with deletion counts per model type. Consider returning consistent information about the deletion operation, or at least documenting in the docstring that this function returns None.
| """Test raw deletion with batch_size=1 (edge case).""" | ||
| # Create a project with 3 versions | ||
| project = get(Project, slug="raw-single-project") | ||
| versions = [get(Version, project=project, slug=f"raw-single-{i}") for i in range(3)] |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable versions is not used.
| """Test raw deleting with a limit smaller than total queryset count.""" | ||
| # Create a project with 20 versions | ||
| project = get(Project, slug="raw-limit-project") | ||
| versions = [get(Version, project=project, slug=f"raw-limit-{i}") for i in range(20)] |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable versions is not used.
| versions = [get(Version, project=project, slug=f"raw-limit-{i}") for i in range(20)] | |
| for i in range(20): | |
| get(Version, project=project, slug=f"raw-limit-{i}") |
| """Test raw deleting with a limit larger than total queryset count.""" | ||
| # Create a project with 5 versions | ||
| project = get(Project, slug="raw-over-limit") | ||
| versions = [get(Version, project=project, slug=f"raw-over-{i}") for i in range(5)] |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable versions is not used.
| """Test raw deleting with a limit equal to the batch size.""" | ||
| # Create a project with 10 versions | ||
| project = get(Project, slug="raw-equal-limit") | ||
| versions = [get(Version, project=project, slug=f"raw-eq-lim-{i}") for i in range(10)] |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable versions is not used.
| versions = [get(Version, project=project, slug=f"raw-eq-lim-{i}") for i in range(10)] | |
| for i in range(10): | |
| get(Version, project=project, slug=f"raw-eq-lim-{i}") |
| """Test raw deleting with limit=1 (edge case).""" | ||
| # Create a project with 5 versions | ||
| project = get(Project, slug="raw-one-limit") | ||
| versions = [get(Version, project=project, slug=f"raw-one-{i}") for i in range(5)] |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable versions is not used.
|
@copilot change the tests from this PR so that we don't have unused variables |
humitos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern 👍🏼
I haven't reviewed the tests.
| """ | ||
| Delete all imported files that are no longer needed. | ||
|
|
||
| We only need to keep track of the top-level ``404.html`` and all ``index.html`` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we are not inserting new ImportedFiles that don't meet these conditions, right?
|
|
||
| NOTE: the logic of this task could be improved to keep longer data we care | ||
| more (eg. active projects )and remove data we don't (eg. builds from spam projects) | ||
| more (eg. active projects ) and remove data we don't (eg. builds from spam projects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| more (eg. active projects ) and remove data we don't (eg. builds from spam projects) | |
| more (eg. active projects) and remove data we don't (eg. builds from spam projects) |
Instead of manually deleting a big amount of rows, use a task to delete them in chunks to avoid overloading the database.
Ref https://github.com/readthedocs/meta/issues/202