Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstreaming 2024 Nyrkiö patches #27

Merged
merged 10 commits into from
Jan 10, 2025
Merged

Conversation

henrikingo
Copy link
Contributor

No description provided.

@henrikingo
Copy link
Contributor Author

Ah. I think I know what this is: We use hunter together with Nyrkiö, and the latter has pytest-benchmark, so it successfully went through all our own testing. I'll addpytest-benchmark to poetry files and try again. (But not today.)

@Gerrrr
Copy link
Contributor

Gerrrr commented Jan 10, 2025

All the changes look good! We just need to remove pytest-benchmark or update the lock.

@henrikingo
Copy link
Contributor Author

Ok actually I think you need to restart the test. I'm not a committer yet :-D

This can be used to compare whether an AnalyzedSeries object is more
recent than the set of change points it was computed from. (Think
cache invalidation, even if the AnalyzedSeries isn't necessarily a
cache.)
Hunter modified e-divisive such that it first does a pass using a
higher p-value, then filters out all change points that have a
higher p-value than the actual max_pvalue specified by the user.

The initial higher pvalue is max_pvalue * 10. This means that
for values higher than 0.1, the first pvalue is > 1.0. This
doesn't make sense. In fact 1.0 also doesn't make sense because
now every point is a weak change point.

This patch modifies the call to split() such that:
 max_pvalue:   first pass pvalue
 0.0 - 0.05:   max_pvalue * 10 (unchanged)
 0.0 - 0.5 :   max_pvalue * 2
 0.5 - 1.0 :   max_pvalue

Since values above  0.1 didn't really make sense before this patch,
the area where this could cause changes is for users using max_pvalue
between 0.05 and 0.1. The merge() phase should however eventually produce
approximately the same set of change points anyway, but this isn't
guaranteed.
This is the default in AnalysisOptions, which most users would
use, since it is required in the typical code path.
The common case is to add new data points to the end of the series.
In this case we don't need to recompute all change points, we can
just compute window_len points from the end. We do roughly
2 * window_len for good measure.
@henrikingo
Copy link
Contributor Author

Ok actually I think you need to restart the test. I'm not a committer yet :-D

Never mind. It was just so fast I didn't realize it had ran already.

Since it is random by design, can't really unit testas usual.
@henrikingo henrikingo merged commit 72c34f6 into apache:master Jan 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants