[Fleet] Repro backing index is overlaping with backing index #193503

nchaulet · 2024-09-20T00:59:32Z

Summary

Reproduction for the following error when upgrading an integration after a rollback

\t\tillegal_argument_exception: backing index [.ds-metrics-no_tsdb_to_tsdb.test-default-2024.09.20-000002] with range [2024-09-19T22:49:57.000Z TO 2024-09-20T01:19:57.000Z] is overlapping with backing index [.ds-metrics-no_tsdb_to_tsdb.test-default-2024.09.20-000005] with range [2024-09-19T22:50:04.000Z TO 2024-09-20T01:20:04.000Z]

When the upgrade to TSDB for a datastream do not succeed and we rollback to a non TDSB version, the following upgrades are not working

Details on the failing scenario

we install the package this create the datastream metrics-no_tsdb_to_tsdb.test-default without tsdb => this create a backing index 0001
We upgrade, update metrics-no_tsdb_to_tsdb.test-default datastream to TSDB => this create a timeseries backing index 0002
we rollback thepckage, update metrics-no_tsdb_to_tsdb.test- and rollover without tsb => this create backing indices 0003, 0004 without timeseries
We try to upgrade again metrics-no_tsdb_to_tsdb.test-default datastream to TSDB => this fail with

backing index [.ds-metrics-no_tsdb_to_tsdb.test-default-2024.09.20-000002] with range [2024-09-19T22:49:57.000Z TO 2024-09-20T01:19:57.000Z] is overlapping with backing index [.ds-metrics-no_tsdb_to_tsdb.test-default-2024.09.20-000005] with range [2024-09-19T22:50:04.000Z TO 2024-09-20T01:20:04.000Z]

@martijnvg your name seems associated to a lot of TSDB in elasticsearch, maybe you can help me understand (or redirect me to someone who can) the behaviour here, and if either there is a bug in how the upgrade is handled in elasticsearch, or if there is something in Fleet we can do to avoid that, thanks a lot

elasticmachine · 2024-09-20T00:59:36Z

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

Click to trigger kibana-pull-request for this PR!

obltmachine · 2024-09-20T00:59:48Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

kibana-ci · 2024-09-20T13:38:31Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #45 / EPM Endpoints EPM - get Installed Packages Allows the fetching of installed packages
[job] [logs] FTR Configs #45 / EPM Endpoints EPM - get Installed Packages Allows the fetching of installed packages

Metrics [docs]

✅ unchanged

History

💔 Build #235902 failed 9cdc8404a881578f718263938fa5038a68daf549
💔 Build #235803 failed b5446873af7b152080ebe2d8d8f4abb150751763

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

nchaulet · 2024-09-20T14:20:23Z

x-pack/test/fleet_api_integration/apis/epm/test_upgrade_tsdb.ts

+        .send({ force: true })
+        .expect(200);
+
+      // Simulate rollover on upgrade it should throw


rollover is lazy on ugprade it's why I triggered the rollover in that test

salvatore-campagna · 2024-09-23T14:28:35Z

@nchaulet what version of Elasticsearch are you using?

nchaulet · 2024-09-23T14:41:37Z

@nchaulet what version of Elasticsearch are you using?

@salvatore-campagna the latest SNAPSHOT but we also have SDH having the same issue with 8.12.2

salvatore-campagna · 2024-09-23T15:17:53Z

So time_series indices have specific start_time and end_time settings which prevent the existence of overlapping indices with overlapping time ranges. This is happening because switching back and forth results in multiple time_series indices being created in the data stream. Probably the new index (after the latest upgrade) is created with a start_time that is smaller than end_time of an existing time_series index. What I can suggest is to try to wait for the most recent end_time among all the existing time_series indices in the data stream to expire, before upgrading.

nchaulet · 2024-09-23T15:59:13Z

This is happening because switching back and forth results in multiple time_series indices being created in the data stream. Probably the new index (after the latest upgrade) is created with a start_time that is smaller than end_time of an existing time_series index.

@salvatore-campagna But should this is be handled when doing the rollover that create the new index? when we rollover an existing tsdb datastream it handle that no?

salvatore-campagna · 2024-09-24T07:10:02Z

This is happening because switching back and forth results in multiple time_series indices being created in the data stream. Probably the new index (after the latest upgrade) is created with a start_time that is smaller than end_time of an existing time_series index.

@salvatore-campagna But should this is be handled when doing the rollover that create the new index? when we rollover an existing tsdb datastream it handle that no?

There are two types of rollover operations occurring here (happening as a result of installing a new package/integration version):

Rollover from a time_series index to a standard index: this happens when downgrading from TSDB (time-series database) to a standard index.
Rollover from a standard index to a time_series index: this occurs when upgrading back to TSDB.

As a result, the data stream will contain at least three indices (though in your case, there are more). However, we are primarily concerned with the most recent three indices created:

The first time_series index with start_time and end_time (let's call this index1).
The standard index (index2).
The most recent time_series index, again with start_time and end_time (index3).

When attempting the latest upgrade, you encounter a situation where index3.start_time < index1.end_time. This is not automatically handled during the rollover process. This is the time when the error happens. Backing time_series indices belonging to the same data stream are checked for overlapping time ranges, which are not allowed.

For this reason, I recommend waiting for index1.end_time to expire before initiating the next upgrade. By upgrading later, you ensure that when index3 is created, it will have index3.start_time > index1.end_time, avoiding the time overlap issue.

nchaulet · 2024-09-24T11:29:02Z

When attempting the latest upgrade, you encounter a situation where index3.start_time < index1.end_time. This is not automatically handled during the rollover process. This is the time when the error happens. Backing time_series indices belonging to the same data stream are checked for overlapping time ranges, which are not allowed.

Could it be automatically handled during the rollover process? it seems to me it will be a better experience for the use. I can create an issue in ES for that improvments

For this reason, I recommend waiting for index1.end_time to expire before initiating the next upgrade. By upgrading later, you ensure that when index3 is created, it will have index3.start_time > index1.end_time, avoiding the time overlap issue.

One of the issue here is those upgrades/rollback come from automated processes in Fleet, and having to wait 4 hours is not really an ideal scenario

salvatore-campagna · 2024-09-24T13:02:27Z

When attempting the latest upgrade, you encounter a situation where index3.start_time < index1.end_time. This is not automatically handled during the rollover process. This is the time when the error happens. Backing time_series indices belonging to the same data stream are checked for overlapping time ranges, which are not allowed.

Could it be automatically handled during the rollover process? it seems to me it will be a better experience for the use. I can create an issue in ES for that improvments

For this reason, I recommend waiting for index1.end_time to expire before initiating the next upgrade. By upgrading later, you ensure that when index3 is created, it will have index3.start_time > index1.end_time, avoiding the time overlap issue.

One of the issue here is those upgrades/rollback come from automated processes in Fleet, and having to wait 4 hours is not really an ideal scenario

I see and I understand this might be an issue. If you can create an issue for us we will see what we can do.

nchaulet force-pushed the repro-overlapping-backing-index branch from b544687 to 9cdc840 Compare September 20, 2024 01:30

[Fleet] Repro backing index is overlaping with backing index

b2574a0

nchaulet force-pushed the repro-overlapping-backing-index branch from 9cdc840 to b2574a0 Compare September 20, 2024 12:46

nchaulet commented Sep 20, 2024

View reviewed changes

nchaulet mentioned this pull request Sep 24, 2024

Enabling tsdb on a standard datastream that have been a tsdb datastream previously fail with overlapping backing index elastic/elasticsearch#113480

Closed

nchaulet closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Repro backing index is overlaping with backing index #193503

[Fleet] Repro backing index is overlaping with backing index #193503

nchaulet commented Sep 20, 2024 •

edited

Loading

elasticmachine commented Sep 20, 2024 •

edited

Loading

obltmachine commented Sep 20, 2024

kibana-ci commented Sep 20, 2024 •

edited

Loading

nchaulet Sep 20, 2024

salvatore-campagna commented Sep 23, 2024

nchaulet commented Sep 23, 2024

salvatore-campagna commented Sep 23, 2024 •

edited

Loading

nchaulet commented Sep 23, 2024

salvatore-campagna commented Sep 24, 2024 •

edited

Loading

nchaulet commented Sep 24, 2024

salvatore-campagna commented Sep 24, 2024

[Fleet] Repro backing index is overlaping with backing index #193503

[Fleet] Repro backing index is overlaping with backing index #193503

Conversation

nchaulet commented Sep 20, 2024 • edited Loading

Summary

Details on the failing scenario

elasticmachine commented Sep 20, 2024 • edited Loading

obltmachine commented Sep 20, 2024

🤖 GitHub comments

kibana-ci commented Sep 20, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

History

nchaulet Sep 20, 2024

Choose a reason for hiding this comment

salvatore-campagna commented Sep 23, 2024

nchaulet commented Sep 23, 2024

salvatore-campagna commented Sep 23, 2024 • edited Loading

nchaulet commented Sep 23, 2024

salvatore-campagna commented Sep 24, 2024 • edited Loading

nchaulet commented Sep 24, 2024

salvatore-campagna commented Sep 24, 2024

nchaulet commented Sep 20, 2024 •

edited

Loading

elasticmachine commented Sep 20, 2024 •

edited

Loading

kibana-ci commented Sep 20, 2024 •

edited

Loading

salvatore-campagna commented Sep 23, 2024 •

edited

Loading

salvatore-campagna commented Sep 24, 2024 •

edited

Loading