Skip to content

Conversation

@rcap107
Copy link
Member

@rcap107 rcap107 commented Jan 21, 2026

This PR is addressing various issues that were discovered because CircleCI was failing because it was running out of memory.

The main issue with CircleCI was addressed by adding the html-split command to the docs makefile. This command first builds the examples and excludes the rest of the documentation, then builds the documentation without re-executing the examples. Finally, it copies the files generated for the examples to the build folder to have the full documentation build.

I also removed doctests from the sphinx configuration because they should already be covered by test and test-user-guide, which should help saving some memory.

Since I had to update the lock file, I also fixed a few tests that were breaking because of the pandas 3.0 release.

I also added memory_profiler as a doc dependency to show the peak memory for the various examples.

I also removed all the leftover references to the KEN embeddings that were not removed in the relevant PR.

assert not ns.is_string(ns.col(df, col))


def test_sentinel_is_string_pandas_3(df_module):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find anywhere the reason for adding this test, other than acting as a reminder that pandas 3.0 has been released. This is the only test that fails, so I'm just removing it here

@rcap107 rcap107 changed the title WIP - fixing CI failing FIX - CI OOM issue and some tests failing due to pandas 3.0 Jan 23, 2026
@rcap107 rcap107 marked this pull request as ready for review January 23, 2026 16:44


>>> import pandas as pd
>>> import warnings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this works, this is really ugly and I'd rather not do this

The problem is that we are executing the doctests with different versions of pandas. With min-reqs, it's pandas 1.5.3 and these lines (with object) work fine, while in the latest versions it's pandas 3 which raises a deprecation warning here and asks to put in str to silence it.

So, there is no way around this problem other than rewriting the example entirely so that it does not depend on the pandas version.

@rcap107
Copy link
Member Author

rcap107 commented Jan 26, 2026

Numpy 2.4.1 is still causing the CI to run out of memory

@lesteve
Copy link
Contributor

lesteve commented Jan 28, 2026

FWIW in scikit-learn we did bump into a somewhat similar issue, see scikit-learn/scikit-learn#32902.

It's not clear what was causing it, but I suspect some updated package maybe? A shortlist of packages were in scikit-learn/scikit-learn#32902 (comment) and I tried to reproduce in the CI with SSH debug and it did not seem to be numpy 2.4.

As a work-around, we are now using a bigger CircleCI runner (medium+ 6GB RAM instead of 4GB) to avoid the issue, see https://github.com/scikit-learn/scikit-learn/blob/62b08044bf7d4d8e96b439247c6d8d6ea0445496/.circleci/config.yml#L57-L60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants