Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide API to identify dangling searchable snapshots #113168

Open
romain-chanu opened this issue Sep 19, 2024 · 2 comments
Open

Provide API to identify dangling searchable snapshots #113168

romain-chanu opened this issue Sep 19, 2024 · 2 comments
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team

Comments

@romain-chanu
Copy link

romain-chanu commented Sep 19, 2024

Description

What is a dangling searchable snapshot?

It is a searchable snapshot stored in a snapshot repository and no longer referenced/used by an Elasticsearch cluster. This can happen in the following situations:

  1. Users have deleted searchable snapshot indices and/or data streams (containing searchable snapshot indices) via the delete index API or the delete data stream API.

If users manually delete an index or data stream before ILM delete phase runs, then ILM will not delete the underlying searchable snapshot. Users would need to use the Delete snapshots API to remove the searchable snapshot from the snapshot repository when it is no longer needed.

  1. Users have configured the respective ILM policy with a delete phase but the delete_searchable_snapshot is set to false (c.f Delete). Users would need to use the Delete snapshots API to remove the searchable snapshot from the snapshot repository when it is no longer needed.

How to determine if a searchable snapshot is dangling?

As of the time of writing, Elasticsearch does not provide an API to retrieve such information. Manual checks need to be done which could very tedious and error-prone.

Motivation

  • Users could be impacted by high storage costs associated to dangling searchable snapshots stored in a snapshot repository (we have seen situations in the field where thousands of orphaned snapshots were identified and dozens of TBs of storage used)
  • Being able to find and remove dangling searchable snapshots could result in significant cost-savings for users
@romain-chanu romain-chanu added >enhancement needs:triage Requires assignment of a team area label labels Sep 19, 2024
@romain-chanu romain-chanu changed the title Provide API to identify orphaned searchable snapshots Provide API to identify dangling searchable snapshots Sep 19, 2024
@DaveCTurner DaveCTurner added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed needs:triage Requires assignment of a team area label labels Sep 19, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Sep 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

I'm not really sure that we can reliably identify such snapshots, there's nothing particularly special about these snapshots vs any other snapshots the user might have taken.

If the user is only using SLM and ILM to take snapshots then you can identify all non-SLM snapshots with GET _snapshot/_all/_all?slm_policy_filter=_none, but ofc this will include both mounted and dangling snapshots. Would it be enough to add another filter to the get-snapshots API to exclude mounted snapshots perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

3 participants