-
Notifications
You must be signed in to change notification settings - Fork 200
FIX - CI OOM issue and some tests failing due to pandas 3.0 #1855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| assert not ns.is_string(ns.col(df, col)) | ||
|
|
||
|
|
||
| def test_sentinel_is_string_pandas_3(df_module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find anywhere the reason for adding this test, other than acting as a reminder that pandas 3.0 has been released. This is the only test that fails, so I'm just removing it here
|
|
||
|
|
||
| >>> import pandas as pd | ||
| >>> import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this works, this is really ugly and I'd rather not do this
The problem is that we are executing the doctests with different versions of pandas. With min-reqs, it's pandas 1.5.3 and these lines (with object) work fine, while in the latest versions it's pandas 3 which raises a deprecation warning here and asks to put in str to silence it.
So, there is no way around this problem other than rewriting the example entirely so that it does not depend on the pandas version.
|
Numpy 2.4.1 is still causing the CI to run out of memory |
|
FWIW in scikit-learn we did bump into a somewhat similar issue, see scikit-learn/scikit-learn#32902. It's not clear what was causing it, but I suspect some updated package maybe? A shortlist of packages were in scikit-learn/scikit-learn#32902 (comment) and I tried to reproduce in the CI with SSH debug and it did not seem to be numpy 2.4. As a work-around, we are now using a bigger CircleCI runner ( |
This PR is addressing various issues that were discovered because CircleCI was failing because it was running out of memory.
The main issue with CircleCI was addressed by adding the
html-splitcommand to the docs makefile. This command first builds the examples and excludes the rest of the documentation, then builds the documentation without re-executing the examples. Finally, it copies the files generated for the examples to the build folder to have the full documentation build.I also removed doctests from the sphinx configuration because they should already be covered by
testandtest-user-guide, which should help saving some memory.Since I had to update the lock file, I also fixed a few tests that were breaking because of the pandas 3.0 release.
I also added
memory_profileras a doc dependency to show the peak memory for the various examples.I also removed all the leftover references to the KEN embeddings that were not removed in the relevant PR.