Skip to content

Conversation

@stsewd
Copy link
Member

@stsewd stsewd commented Jan 28, 2026

Instead of manually deleting a big amount of rows, use a task to delete them in chunks to avoid overloading the database.

Ref https://github.com/readthedocs/meta/issues/202

Instead of manually deleting a big amount of rows,
use a task to delete them in chunks to avoid overloading the database.

Ref readthedocs/meta#202
@read-the-docs-community
Copy link

read-the-docs-community bot commented Jan 28, 2026

Documentation build overview

📚 docs | 🛠️ Build #31203702 | 📁 Comparing 8cc3a03 against latest (e7196f0)


🔍 Preview build

Show files changed (1 files in total): 📝 1 modified | ➕ 0 added | ➖ 0 deleted
File Status
custom-script.html 📝 modified

@stsewd
Copy link
Member Author

stsewd commented Jan 29, 2026

@copilot write tests for delete_in_batches with a limit, and add tests for raw_delete_in_batches

Copy link
Contributor

Copilot AI commented Jan 29, 2026

@stsewd I've opened a new pull request, #12739, to work on those changes. Once the pull request is ready, I'll request review from you.

…ches (#12739)

Adds comprehensive test coverage for batch deletion utilities to
validate limit parameter functionality and raw deletion behavior.

## Changes

- **`delete_in_batches` limit tests** (5 tests): Validates limit
parameter correctly bounds deletions when limit is smaller, larger,
equal to batch size, and edge case limit=1
- **`raw_delete_in_batches` tests** (10 tests): Validates raw deletion
across empty querysets, batch size boundaries, multiple batches, and
limit parameter variations

All tests verify correct deletion counts and remaining object counts
after batch operations.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: stsewd <[email protected]>
@stsewd stsewd marked this pull request as ready for review January 29, 2026 23:59
@stsewd stsewd requested a review from a team as a code owner January 29, 2026 23:59
@stsewd stsewd requested a review from agjohnson January 29, 2026 23:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors database cleanup tasks to use batch deletion instead of manual SQL queries or raw ORM operations that could overload the database. The changes introduce a new raw_delete_in_batches utility function and update existing deletion tasks to delete records in smaller chunks with configurable limits.

Changes:

  • Added raw_delete_in_batches utility function for batch deletion without triggering signals or cascades
  • Enhanced delete_in_batches to support a limit parameter for controlling the total number of deletions
  • Refactored delete_old_build_data task to use batch deletion with limits
  • Added two new scheduled tasks: delete_outdated_imported_files and delete_orphaned_addons_configs
  • Configured temporary hourly execution of cleanup tasks with limits to process backlogs

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
readthedocs/core/utils/db.py Added raw_delete_in_batches function and enhanced delete_in_batches with limit parameter
readthedocs/core/tests/test_db_utils.py Added comprehensive test coverage for the new limit parameter and raw_delete_in_batches function
readthedocs/telemetry/tasks.py Refactored delete_old_build_data to support batch deletion with configurable limits
readthedocs/core/tasks.py Added two new cleanup tasks for imported files and orphaned addon configs
readthedocs/settings/base.py Configured periodic task schedules with hourly execution and limits for processing backlogs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -32,8 +36,46 @@ def delete_in_batches(queryset, batch_size=50) -> tuple[int, dict]:
# We can't use a limit or offset with .delete,
# so we first extract the IDs and perform the deletion in anothr query.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment: "anothr" should be "another".

Copilot uses AI. Check for mistakes.

model = queryset.model
# We can't use a limit or offset with .raw_delete,
# so we first extract the IDs and perform the deletion in anothr query.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment: "anothr" should be "another".

Copilot uses AI. Check for mistakes.
"options": {"queue": "web"},
},
# TODO: delete this task when all imported files pending deletion are done.
# It shuold take around 36 days to delete all the old imported files on community,
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment: "shuold" should be "should".

Suggested change
# It shuold take around 36 days to delete all the old imported files on community,
# It should take around 36 days to delete all the old imported files on community,

Copilot uses AI. Check for mistakes.
Comment on lines 48 to 81
def raw_delete_in_batches(queryset, batch_size=50, limit: int | None = None):
"""
Raw delete a queryset in batches to avoid long transactions or big queries.

Similar to ``delete_in_batches``, but uses ``_raw_delete``.
This is useful when you don't want to overload the DB by deleting
a large number of records at once, since _raw_delete() doesn't support
limits/offsets.

.. warning::

Since this uses ``_raw_delete``, it won't trigger any
signals, and won't cascade delete related objects.
Use it only if you are sure there are no related objects
that need to be deleted/updated.
"""
# Don't use batch deletion if the number of records
# is smaller or equal to the batch size.
count = queryset.count()
if count == 0:
return
if count <= batch_size:
queryset._raw_delete(queryset.db)
return

model = queryset.model
# We can't use a limit or offset with .raw_delete,
# so we first extract the IDs and perform the deletion in anothr query.
all_pks = queryset.values_list("pk", flat=True)
if limit:
all_pks = all_pks[:limit]
for batch in batched(all_pks, batch_size):
qs = model.objects.filter(pk__in=batch)
qs._raw_delete(qs.db)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw_delete_in_batches function doesn't return any value, unlike delete_in_batches which returns a tuple containing the total number of deleted objects and a dictionary with deletion counts per model type. Consider returning consistent information about the deletion operation, or at least documenting in the docstring that this function returns None.

Copilot uses AI. Check for mistakes.
"""Test raw deletion with batch_size=1 (edge case)."""
# Create a project with 3 versions
project = get(Project, slug="raw-single-project")
versions = [get(Version, project=project, slug=f"raw-single-{i}") for i in range(3)]
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable versions is not used.

Copilot uses AI. Check for mistakes.
"""Test raw deleting with a limit smaller than total queryset count."""
# Create a project with 20 versions
project = get(Project, slug="raw-limit-project")
versions = [get(Version, project=project, slug=f"raw-limit-{i}") for i in range(20)]
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable versions is not used.

Suggested change
versions = [get(Version, project=project, slug=f"raw-limit-{i}") for i in range(20)]
for i in range(20):
get(Version, project=project, slug=f"raw-limit-{i}")

Copilot uses AI. Check for mistakes.
"""Test raw deleting with a limit larger than total queryset count."""
# Create a project with 5 versions
project = get(Project, slug="raw-over-limit")
versions = [get(Version, project=project, slug=f"raw-over-{i}") for i in range(5)]
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable versions is not used.

Copilot uses AI. Check for mistakes.
"""Test raw deleting with a limit equal to the batch size."""
# Create a project with 10 versions
project = get(Project, slug="raw-equal-limit")
versions = [get(Version, project=project, slug=f"raw-eq-lim-{i}") for i in range(10)]
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable versions is not used.

Suggested change
versions = [get(Version, project=project, slug=f"raw-eq-lim-{i}") for i in range(10)]
for i in range(10):
get(Version, project=project, slug=f"raw-eq-lim-{i}")

Copilot uses AI. Check for mistakes.
"""Test raw deleting with limit=1 (edge case)."""
# Create a project with 5 versions
project = get(Project, slug="raw-one-limit")
versions = [get(Version, project=project, slug=f"raw-one-{i}") for i in range(5)]
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable versions is not used.

Copilot uses AI. Check for mistakes.
@stsewd
Copy link
Member Author

stsewd commented Jan 30, 2026

@copilot change the tests from this PR so that we don't have unused variables

Copy link
Contributor

Copilot AI commented Jan 30, 2026

@stsewd I've opened a new pull request, #12741, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern 👍🏼

I haven't reviewed the tests.

"""
Delete all imported files that are no longer needed.

We only need to keep track of the top-level ``404.html`` and all ``index.html`` files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we are not inserting new ImportedFiles that don't meet these conditions, right?


NOTE: the logic of this task could be improved to keep longer data we care
more (eg. active projects )and remove data we don't (eg. builds from spam projects)
more (eg. active projects ) and remove data we don't (eg. builds from spam projects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
more (eg. active projects ) and remove data we don't (eg. builds from spam projects)
more (eg. active projects) and remove data we don't (eg. builds from spam projects)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants