[ENH] Add BORF transformer #2062

fspinna · 2024-09-16T14:08:58Z

What does this implement/fix? Explain your changes.

I am trying to add a Bag-Of-Receptive-Fields transformer as seen in Spinnato, Francesco, et al. "A Bag of Receptive Fields for Time Series Extrinsic Predictions." arXiv preprint arXiv:2311.18029 (2023). It is basically a Bag-Of-Pattern transform, where a time series is transformed in a sparse matrix of counts of discretized subsequences.

Does your contribution introduce a new dependency? If yes, which one?

This contribution introduces a dependency to the pydata sparse library (https://sparse.pydata.org/en/stable/).

Any other comments?

I have a couple of issues/doubts:

the transformer fails 2 tests:

'test_persistence_via_pickle(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)': "FAILED: 'csr_matrix' object has no attribute 'ravel'",
'check_fit_deterministic(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)': "FAILED: 'csr_matrix' object has no attribute 'ravel'"

This is because the output of the transformer is a sparse scipy matrix (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix). I don't know if this is a problem. In practice, the sparse matrix is compatible with downstream sklearn estimators/transformers.

the sparse package is used as a soft dependency, which is imported in the transform of IndividualBORF, which inherits from sklearn's BaseEstimator, TransformerMixin, and not BaseCollectionTransformer. Is this a problem?

PR checklist

For all contributions

I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

For new estimators and functions

I've added the estimator to the online API documentation.
(OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

aeon-actions-bot · 2024-09-16T14:09:20Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#41A8F6}{\textsf{transformations}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Push an empty commit to re-run CI checks

TonyBagnall · 2024-09-16T16:25:39Z

great, thanks for that

aeon/transformations/collection/dictionary_based/_borf.py

TonyBagnall · 2024-10-12T17:23:43Z

hi @fspinna sorry for the delay on this, we will definitely review this week. I'm afraid there are a couple of conflicts which should be easy to resolve

TonyBagnall

thanks again, but my this is huge! A couple of questions

You use the sparse package. Elsewhere we use from scipy.sparse import csr_matrix. scipy is a core dependency, so it would be preferable to use this if it is fit for purpose.
It would also be good to have a constructor option to return the whole sparse array rather than the compressed version
there are two arguments for n_jobs
n_jobs=self.n_jobs,
n_jobs_numba=self.n_jobs_numba,
how come?
You have reimplemented SAX. We have a version of SAX already, would it be possible to use that? If not, see 5 below.
There are a lot of specific public functions which will bloat the docs, could you make the file specific functions private? things like are_window_size_and_dilation_compatible_with_signal_length Same with all the small sklearn transformers, better off private imo.
You have implemented a hash table?

fspinna · 2024-10-15T08:06:59Z

thanks again, but my this is huge! A couple of questions

Thank you a lot for taking the time for reviewing the PR. I know it's a lot of code. One of the contribution of the paper was to build a very fast transform, so I had to re-implement everything in numba, and sacrifice modularity for efficiency.

You use the sparse package. Elsewhere we use from scipy.sparse import csr_matrix. scipy is a core dependency, so it would be preferable to use this if it is fit for purpose.

Sparse in this case is needed, because the output of transform_sax_patterns is a 3D tensor, which is not supported by scipy.

It would also be good to have a constructor option to return the whole sparse array rather than the compressed version

You mean an option to densify the sparse array to numpy?

there are two arguments for n_jobs
n_jobs=self.n_jobs,
n_jobs_numba=self.n_jobs_numba,
how come?

n_jobs is passed to the sklearn FeatureUnion. n_jobs_numba is passed to IndividualBorf. They are two separate ways of parallelizing the code. Combinations of the two are possible. Maybe a different name for n_jobs_numba would be better? Also setting n_jobs_numba=1 would be fine. But from experience, I saw that sometimes it's faster to parallelize with one or the other.

You have reimplemented SAX. We have a version of SAX already, would it be possible to use that? If not, see 5 below.

One of the contributions of the paper was a better/faster way of computing window-wise SAX. Also, this version has dilation, stride and other parameters that are not usually in SAX. So I don't believe I can use your implementation.
I decided to keep it as a function (and not make it a class), because I had to enclose everything in numba. Unfortunately, this means less modularity.

There are a lot of specific public functions which will bloat the docs, could you make the file specific functions private? things like are_window_size_and_dilation_compatible_with_signal_length Same with all the small sklearn transformers, better off private imo.

I'll do that. So should I make all the functions besides the BORF class private?

You have implemented a hash table?

I modified a little this implementation https://github.com/lhprojects/blog/blob/master/_posts/JupyterNotebooks/HashUnique.ipynb to make it fully compatible with numba, as numba dicts do not work great in my experience.

patrickzib · 2024-10-15T09:36:57Z

One of the contributions of the paper was a better/faster way of computing window-wise SAX. Also, this version has dilation, stride and other parameters that are not usually in SAX. So I don't believe I can use your implementation.

It might be a bit ambitious for this PR, but it would be wonderful to include a fast sliding window SAX-transform with dilation if possible.

TonyBagnall · 2024-10-16T11:45:14Z

do we need a lower bound on the sparse package?

TonyBagnall · 2024-10-16T11:46:13Z

re SAX, we could investigate using this version as our standard and extending it as you say @patrickzib but not in this PR :) IT would be good to make an issue on improving SAX

aeon/transformations/collection/dictionary_based/_borf.py

fspinna · 2024-10-17T07:38:37Z

It might be a bit ambitious for this PR, but it would be wonderful to include a fast sliding window SAX-transform with dilation if possible.

re SAX, we could investigate using this version as our standard and extending it as you say @patrickzib but not in this PR :) IT would be good to make an issue on improving SAX

Also, it's not a perfect replacement for standard SAX, as the window size has to be divisible by the word length in my case.

TonyBagnall · 2024-10-18T07:47:45Z

It might be a bit ambitious for this PR, but it would be wonderful to include a fast sliding window SAX-transform with dilation if possible.

re SAX, we could investigate using this version as our standard and extending it as you say @patrickzib but not in this PR :) IT would be good to make an issue on improving SAX

Also, it's not a perfect replacement for standard SAX, as the window size has to be divisible by the word length in my case.

I like the idea of having a speed optimised constrained version and a more general case one, but its a project in its own right really. This looks ready to go in to me, thanks for the contribution. There are a couple of minor conflicts to resolve caused by changes elsewhere, sorry about that

TonyBagnall · 2024-10-18T07:51:32Z

could we add your paper to this? https://www.aeon-toolkit.org/en/stable/papers_using_aeon.html

Even better, can you add your paper to this (in a separate PR)? :)

fspinna · 2024-10-18T08:05:43Z

do we need a lower bound on the sparse package?

I am using the latest (0.15.4)

could we add your paper to this? https://www.aeon-toolkit.org/en/stable/papers_using_aeon.html

Even better, can you add your paper to this (in a separate PR)? :)

Sure!

I like the idea of having a speed optimised constrained version and a more general case one, but its a project in its own right really. This looks ready to go in to me, thanks for the contribution. There are a couple of minor conflicts to resolve caused by changes elsewhere, sorry about that

Thank you! No problem, let me know if there is anything else to do!

TonyBagnall · 2024-10-18T13:23:48Z

hi @fspinna if you resolve the conflicts I think this is ready to go in

fspinna

Fixed unused comments

# Conflicts: # .all-contributorsrc # docs/api_reference/transformations.rst # pyproject.toml

fspinna · 2024-10-18T16:22:02Z

hi @fspinna if you resolve the conflicts I think this is ready to go in

@TonyBagnall I fixed a few things (tags, doctest) but I am still failing on a few OS. Not really sure what's the issue here.

MatthewMiddlehurst · 2024-10-20T18:00:01Z

Hi @fspinna, not too up to date with this PR but its only failing on some workflows because we split the tests up when running on PRs to save time.

I can take a look at whats failing at some point if no one else finds the time.

MatthewMiddlehurst · 2024-10-20T18:02:30Z

Alternatively, there is a non-deteminsitic and cant-pickle tag which will skip these particular tests, but we try to only use them if it is not an easily fixable issue.

Adding it temporarily to the skipped test list is also an option, and revisiting in another PR.

fspinna · 2024-10-21T08:26:25Z

Hi @fspinna, not too up to date with this PR but its only failing on some workflows because we split the tests up when running on PRs to save time.

I can take a look at whats failing at some point if no one else finds the time.

Ah ok, I was wondering why I was not failing all of them actually!
This one is expected, as the output of the transform is a csr matrix, which is not currently supported I believe (but practically works with most sklearn estimators)

test_all_estimators[check_persistence_via_pickle(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - AttributeError: 'csr_matrix' object has no attribute 'ravel'

So I probably should use the tag "cant_pickle" for this one.

*edit also this one depends on the same issue:

test_all_estimators[check_fit_deterministic(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - AttributeError: 'csr_matrix' object has no attribute 'ravel'

So, the "non_deterministic" tag is also needed.

This one puzzles me as I don't get it locally:

test_all_estimators[check_non_state_changing_method(estimator=BORF(densify=True),datatype=EqualLengthUnivariate-Classification-numpy3D)] - AttributeError: 'IndividualBORF' object has no attribute 'feature_names_in_'

I believe it's a problem with some sklearn estimators I created. I'll try to look into it.

…ith csr matrices, add n_feature_names_in_ and n_features_in_ attribute for sklearn compatibility

…RF for sklearn compatibility

TonyBagnall

LGTM, thanks for the contribution

fixed

fspinna added 7 commits September 16, 2024 15:36

Add BORF to __init__.py

5f927ab

Add sparse dependency to pyproject.toml

19bfe31

Add BORF transformer

40bbb8f

Add maintainer

1de9768

Update .all-contributorsrc

87e32aa

Update transformations.rst

0781ca3

Update .all-contributorsrc

3d33aff

aeon-actions-bot bot added enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package labels Sep 16, 2024

TonyBagnall reviewed Sep 16, 2024

View reviewed changes

aeon/transformations/collection/dictionary_based/_borf.py Outdated Show resolved Hide resolved

fspinna added 2 commits September 17, 2024 09:36

Fix docstring style and variable names

727828c

Change BORF reference in Docstring

f4b19ad

fspinna marked this pull request as ready for review October 1, 2024 10:01

fspinna requested review from patrickzib and MatthewMiddlehurst as code owners October 1, 2024 10:01

TonyBagnall reviewed Oct 14, 2024

View reviewed changes

fspinna added 4 commits October 15, 2024 11:02

Change public functions to private

245003b

Add the option for densifying the BORF output

9331de2

Make most public classes private

59f0ff8

Update get_test_params with densify option

a940180

chrisholder previously requested changes Oct 16, 2024

View reviewed changes

aeon/transformations/collection/dictionary_based/_borf.py Show resolved Hide resolved

aeon/transformations/collection/dictionary_based/_borf.py Show resolved Hide resolved

aeon/transformations/collection/dictionary_based/_borf.py Outdated Show resolved Hide resolved

Remove outdated comments

49cef20

fspinna mentioned this pull request Oct 18, 2024

[DOC] Add BORF to papers_using_aeon.md #2223

Merged

2 tasks

fspinna commented Oct 18, 2024

View reviewed changes

fspinna added 4 commits October 18, 2024 16:00

Merge branch 'refs/heads/main' into borf

77d091f

# Conflicts: # .all-contributorsrc # docs/api_reference/transformations.rst # pyproject.toml

Resolve minor conflicts

aa4cb8f

fix doctest

97dcd63

fix tags

a8689e4

Merge branch 'main' into borf

547f145

fspinna and others added 6 commits October 21, 2024 11:03

add cant_pickle and non_deterministic tags for compatibility issues w…

a726bc6

…ith csr matrices, add n_feature_names_in_ and n_features_in_ attribute for sklearn compatibility

Merge remote-tracking branch 'origin/borf' into borf

ae8aa1b

Merge branch 'main' into borf

ae76057

add n_feature_names_in_ and n_features_in_ attributes to IndividualBO…

7c6e6db

…RF for sklearn compatibility

Merge remote-tracking branch 'origin/borf' into borf

70587c2

Merge branch 'main' into borf

f866b5e

TonyBagnall approved these changes Oct 21, 2024

View reviewed changes

TonyBagnall merged commit 4ec60da into aeon-toolkit:main Oct 21, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add BORF transformer #2062

[ENH] Add BORF transformer #2062

fspinna commented Sep 16, 2024

aeon-actions-bot bot commented Sep 16, 2024

TonyBagnall commented Sep 16, 2024

TonyBagnall commented Oct 12, 2024

TonyBagnall left a comment

fspinna commented Oct 15, 2024

patrickzib commented Oct 15, 2024

TonyBagnall commented Oct 16, 2024

TonyBagnall commented Oct 16, 2024

fspinna commented Oct 17, 2024

TonyBagnall commented Oct 18, 2024

TonyBagnall commented Oct 18, 2024

fspinna commented Oct 18, 2024

TonyBagnall commented Oct 18, 2024

fspinna left a comment

fspinna commented Oct 18, 2024

MatthewMiddlehurst commented Oct 20, 2024

MatthewMiddlehurst commented Oct 20, 2024

fspinna commented Oct 21, 2024 •

edited

Loading

TonyBagnall left a comment

[ENH] Add BORF transformer #2062

[ENH] Add BORF transformer #2062

Conversation

fspinna commented Sep 16, 2024

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

For new estimators and functions

aeon-actions-bot bot commented Sep 16, 2024

Thank you for contributing to aeon

PR CI actions

TonyBagnall commented Sep 16, 2024

TonyBagnall commented Oct 12, 2024

TonyBagnall left a comment

Choose a reason for hiding this comment

fspinna commented Oct 15, 2024

patrickzib commented Oct 15, 2024

TonyBagnall commented Oct 16, 2024

TonyBagnall commented Oct 16, 2024

fspinna commented Oct 17, 2024

TonyBagnall commented Oct 18, 2024

TonyBagnall commented Oct 18, 2024

fspinna commented Oct 18, 2024

TonyBagnall commented Oct 18, 2024

fspinna left a comment

Choose a reason for hiding this comment

fspinna commented Oct 18, 2024

MatthewMiddlehurst commented Oct 20, 2024

MatthewMiddlehurst commented Oct 20, 2024

fspinna commented Oct 21, 2024 • edited Loading

TonyBagnall left a comment

Choose a reason for hiding this comment

Thank you for contributing to `aeon`

fspinna commented Oct 21, 2024 •

edited

Loading