Skip to content

Conversation

@lorentzenchr
Copy link
Contributor

@lorentzenchr lorentzenchr commented Mar 9, 2025

As discussed in scikit-learn/scikit-learn#28901 (comment), this PR adds eval_X and eval_y in order to make LGBM estimators compatible with scikit-learn's (as of version 1.6) Pipeline(..., transform_input=["eval_X"]).

See also scikit-learn/scikit-learn#27124.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's looking like you're struggling to get this passing CI, so I'm going to put into "draft" for now. @ me any time here if you need help with development, and we can open this back up for review once CI is passing.

I saw you had multiple commits responding to linting errors... here's how to run those locally for faster feedback:

# (or conda, whatever you want)
pip install pre-commit
pre-commit run --all-files

And here's how to build locally and run the tests:

# step 1: compile lib_ligihgbtm
# (only need to do this once, because you're not making any C/C++ changes)
cmake -B build -S .
cmake --build build --target _lightgbm -j4

# step 2: install the Python package, re-using it
# (do this every time you change Python code in the library)
sh build-python.sh install --precompile

# step 3: run the scikit-learn tests
pytest tests/python_package_test/test_sklearn.py

@jameslamb jameslamb marked this pull request as draft March 9, 2025 23:15
@lorentzenchr
Copy link
Contributor Author

@jameslamb Thanks for your suggestions.
Could you already comment on the deprecation strategy, raising a warning?
Then, should I adapt all the (scikit-learn api) python tests and replace eval_set by the new eval_X, eval_y (thereby avoiding the warnings in the tests)?

@jameslamb
Copy link
Collaborator

Could you already comment on the deprecation strategy, raising a warning?

Making both options available for a time and raising a deprecation warning when eval_set if non-empty seems fine to me, if we decide to move forward with this. I'd also support a runtime error when both eval_set and eval_X are non-empty, to avoid taking on the complexity of merging those 2 inputs.

I'm sorry but I cannot invest much time in this right now (for example, looking into whether this would introduce inconsistencies with HistGradientBoostingClassifier, XGBoost, or CatBoost). If you want to see this change, getting CI working and then opening it up for others to review is probably the best path.

should I adapt all the (scikit-learn api) python tests and replace eval_set by the new eval_X, eval_y (thereby avoiding the warnings in the tests)?

No, please. As I said in scikit-learn/scikit-learn#28901 (comment), removing eval_set from LightGBM's scikit-learn estimators would be highly disruptive and requires a long deprecation cycle (more than a year, in my opinion). Throughout that time, we need to continue to test it at least as thoroughly as we have been.

@lorentzenchr
Copy link
Contributor Author

@jameslamb I'm sorry, I really need a maintainer's help. The tests in tests/python_package_test/test_dask.py fail even on the master branch, locally on my computer. I tried to play with different versions of dask, numpy, scipy, scikit-learn, no success.
TLDR: CI failure seems to be in master and not caused by this PR.

Details
pytest -x tests/python_package_test/test_dask.py::test_ranker
...
>           dask_ranker = dask_ranker.fit(dX, dy, sample_weight=dw, group=dg)

tests/python_package_test/test_dask.py:714: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../python3_lgbm/lib/python3.11/site-packages/lightgbm/dask.py:1566: in fit
    self._lgb_dask_fit(
../python3_lgbm/lib/python3.11/site-packages/lightgbm/dask.py:1068: in _lgb_dask_fit
    model = _train(
../python3_lgbm/lib/python3.11/site-packages/lightgbm/dask.py:811: in _train
    results = client.gather(futures_classifiers)
../python3_lgbm/lib/python3.11/site-packages/distributed/client.py:2565: in gather
    return self.sync(
../python3_lgbm/lib/python3.11/site-packages/lightgbm/dask.py:205: in _train_part
    data = _concat([x["data"] for x in list_of_parts])

...

 def _concat(seq: List[_DaskPart]) -> _DaskPart:
        if isinstance(seq[0], np.ndarray):
            return np.concatenate(seq, axis=0)
        elif isinstance(seq[0], (pd_DataFrame, pd_Series)):
            return concat(seq, axis=0)
        elif isinstance(seq[0], ss.spmatrix):
            return ss.vstack(seq, format="csr")
        else:
>           raise TypeError(
                f"Data must be one of: numpy arrays, pandas dataframes, sparse matrices (from scipy). Got {type(seq[0]).__name__}."
            )
E           TypeError: Data must be one of: numpy arrays, pandas dataframes, sparse matrices (from scipy). Got tuple.

../python3_lgbm/lib/python3.11/site-packages/lightgbm/dask.py:159: TypeError

@jameslamb
Copy link
Collaborator

What versions of dask / distributed do you have installed?

Searching that error in the issue tracker here has a match: #6739

I suspect you need to pin to dask<2024.12 in your environment, as we do in CI here (#6742).

@lorentzenchr
Copy link
Contributor Author

@jameslamb Thank you so much. Pinning dask<2024.12 worked fine.

@lorentzenchr lorentzenchr marked this pull request as ready for review March 10, 2025 21:34
@lorentzenchr
Copy link
Contributor Author

The remaining CI failures seem unrelated.
TODO for myself: Improve test coverage a bit.

@jameslamb jameslamb changed the title ENH add eval_X, eval_y, deprecate eval_set [python-package] scikit-learn fit() methods: add eval_X, eval_y, deprecate eval_set Mar 10, 2025
@jameslamb
Copy link
Collaborator

TODO for myself: Improve test coverage a bit.

@lorentzenchr if you are interested in continuing this I'd be happy to help with reviews. I'm supportive of adding this, for better compatibility with newer versions of scikit-learn.

@lorentzenchr
Copy link
Contributor Author

@jameslamb Yes, I‘d like to finish this. Your review would be great. Anything from me you need before you can start reviewing?

@jameslamb
Copy link
Collaborator

Great! I'd been waiting to review until you were done adding whatever tests you wanted.

If you'd like a review before then, update this to latest master and get CI passing (especially the check that LightGBM works with its oldest support scikit-learn version), then @ me and I'll be happy to provide some feedback.

# train
gbm = lgb.LGBMRegressor(num_leaves=31, learning_rate=0.05, n_estimators=20)
gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric="l1", callbacks=[lgb.early_stopping(5)])
gbm.fit(X_train, y_train, eval_X=(X_test,), eval_y=(y_test,), eval_metric="l1", callbacks=[lgb.early_stopping(5)])
Copy link
Collaborator

@jameslamb jameslamb Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

If we're going to consider eval_set deprecated and eval_{X,y} the new recommended pattern, l think we should nudge users towards that by updating all of the documentation. I've done that here (examples/ is the only place with such code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eval_X=X_test, eval_y=y_test without wrapping into a tuple is fine, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, either will work. I have a very weak preference for the tuple form in these docs, to make it a little clearer that providing multiple validation sets is supported.

if eval_set is None and eval_X is not None:
if isinstance(eval_X, tuple) != isinstance(eval_y, tuple):
raise ValueError("If eval_X is a tuple, y_val must be a tuple of same length, and vice versa.")
if isinstance(eval_X, tuple) and isinstance(eval_y, tuple):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

This and isinstance(eval_y, tuple) seems redundant given all the previous conditions, but mypy needs it to understand that eval_y is not None at this point. Otherwise, it reports these new errors:

sklearn.py:515: error: Argument 1 to "len" has incompatible type "list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any | tuple[list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any] | None"; expected "Sized"  [arg-type]
sklearn.py:518: error: Argument 2 to "zip" has incompatible type "list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any | tuple[list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any] | None"; expected "Iterable[Any]"  [arg-type]
sklearn.py:518: error: Argument 2 to "zip" has incompatible type "list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any | tuple[list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any | Any | Any | Any] | None"; expected "Iterable[list[float] | list[int] | ndarray[tuple[Any, ...], dtype[Any]] | Any]"  [arg-type]

Comment on lines +509 to +512
if isinstance(eval_set, tuple):
return [eval_set]
else:
return eval_set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

Although providing something like eval_set=(X_valid, y_valid) conflicts with the type hints and docs:

eval_set: Optional[List[_LGBM_ScikitValidSet]] = None,

... it has been supported in lightgbm for a long time:

if eval_set is not None:
if isinstance(eval_set, tuple):
eval_set = [eval_set]

Adding this line preserves that behavior. Changing the existing behavior when eval_set is provided is outside the scope of this PR (other than raising deprecation warnings).

Comment on lines -1764 to -1796
if eval_set is not None:
if eval_group is None:
raise ValueError("Eval_group cannot be None when eval_set is not None")
if len(eval_group) != len(eval_set):
raise ValueError("Length of eval_group should be equal to eval_set")
if (
isinstance(eval_group, dict)
and any(i not in eval_group or eval_group[i] is None for i in range(len(eval_group)))
or isinstance(eval_group, list)
and any(group is None for group in eval_group)
):
raise ValueError(
"Should set group for all eval datasets for ranking task; "
"if you use dict, the index should start from 0"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

eval_set is not None is no longer a reliable test of "validation data provided", now that it can be provided via eval_X and eval_y instead. I did the following with this:

  • changed the check here in LGBMRanker to account for all of eval_{set,X,y}
  • moved the check about the size into LGBMModel.fit(), so the len() calls happen AFTER _validate_eval_set_Xy()
  • deleted the code checking if eval_group is a dictionary... supplying eval_group as a dictionary is not supported
  • improved these error messages a bit, since I was touching them anyway
  • added new test cases covering these ranker-specific codepaths

Comment on lines +2130 to +2131
np.testing.assert_allclose(gbm1.predict(X), gbm2.predict(X))
assert gbm1.evals_result_["valid_0"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_0"]["l2"][0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

Added this check on the validation results. Just checking the predicted values is not sufficient to test that the passed validation sets were actually used.

X_test1, X_test2 = X_test[: n // 2], X_test[n // 2 :]
y_test1, y_test2 = y_test[: n // 2], y_test[n // 2 :]
gbm1 = lgb.LGBMRegressor(**params)
with pytest.warns(LGBMDeprecationWarning, match="The argument 'eval_set' is deprecated.*"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

Added these with pytest.warns() to suppress these warnings in test logs.

It's still valuable, I think, to have the standalone test_eval_set_deprecation test.

Comment on lines 2142 to 2149
assert set(gbm2.evals_result_.keys()) == {"valid_0", "valid_1"}, (
f"expected 2 validation sets in evals_result_, got {gbm2.evals_result_.keys()}"
)
assert gbm1.evals_result_["valid_0"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_0"]["l2"][0])
assert gbm1.evals_result_["valid_1"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_1"]["l2"][0])
assert gbm2.evals_result_["valid_0"]["l2"] != gbm2.evals_result_["valid_1"]["l2"], (
"Evaluation results for the 2 validation sets are not different. This might mean they weren't both used."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

Expanded these tests so they could catch more possible bugs, like:

  • one of the validation sets was ignored
  • the same validation set was referenced multiple times (instead of them being considered separately)

gbm.fit(X_train, y_train, eval_X=(X_test,) * 3, eval_y=(y_test,) * 2)


def test_ranker_eval_set_raises():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed by me in d7e0fff)

This test cases checks all of the validation of eval_group that I shuffled around (see https://github.com/microsoft/LightGBM/pull/6857/files#r2650232387)

@jameslamb
Copy link
Collaborator

Ok done adding inline comment threads. One other note... as you probably expect, the R-package / rchck job failures are not related to changes here. That's tracked in #7113. If it isn't resolved soon, we'll skip that CI check so this can be merged.

@jameslamb
Copy link
Collaborator

I've updated this to latest master to pull in CI fixes. I'm still hoping another maintainer will find time to review this, sorry for the delay and thanks for your patience.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorentzenchr Thank you so much for your hard work here! Generally LGTM! But please consider checking some my minor comments below.

Comment on lines 343 to 345
eval_set : list or None, optional (default=None) (deprecated)
A list of (X, y) tuple pairs to use as validation sets.
This is deprecated, use `eval_X` and `eval_y` instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in feafe22

And looks like it's now .. version-deprecated:: not .. deprecated::

Changed in version 9.0: The deprecated directive was renamed to version-deprecated. The previous name is retained as an alias

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait re-reading that.... Sphinx 9.0 is very new (November 2025).

I just pushed 2924d68 switching back to .. deprecated.

I'll test using Sphinx 9.x in a separate PR, on a LightGBM branch where we can test how it's rendered on readthedocs.

@jameslamb jameslamb mentioned this pull request Jan 17, 2026
39 tasks
@jameslamb jameslamb requested a review from StrikerRUS January 18, 2026 03:49
@jameslamb
Copy link
Collaborator

Ok @StrikerRUS I think I've addressed all comments, could you take another look and merge this if you approve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants