[python-package] Allow using only the last dataset for early stopping #6360

jdawang · 2024-03-14T17:34:58Z

Summary

I would like to add a parameter, last_dataset_only or similar naming, to lightgbm.callback.early_stopping that would set early stopping to use the last item of eval_set only.

Motivation

There are situations where it's desirable to have multiple evaluation sets. Sometimes we want to record the evaluation results at each iteration for multiple datasets, but only use one for early stopping. The way XGBoost deals with this is by using only the last item of the eval_set to determine early stopping. We could score the model at each iteration to recreate the evaluation history, but this is inefficient.

This is also important for us when we are developing tools or pipelines that we want to be compatible with both LightGBM and XGBoost, like implementing feature selection or model selection algorithms/utilities that we want to be able to work with both.

Description

In the early stopping callback, LightGBM will use all datasets provided for early stopping. This would add a parameter, last_dataset_only or similar naming, to lightgbm.callback.early_stoppingthat would set early stopping to use the last item ofeval_set` only to determine when to early stop.

I would like the following to create an early stopping callback that would use only the first metric from the last dataset in eval_set to early stop, but would still score on every dataset in eval_set:

from lightgbm.callback import early_stopping

es_cb = early_stopping(5, first_metric_only=True, last_dataset_only=True)

I'm not super familiar with the LGBM codebase and what, if anything, would need to be changed in the codebase besides the early stopping callback, but for what it's worth, I have a working version of a modified early stopping callback that I'm happy to work with you to contribute.

The text was updated successfully, but these errors were encountered:

jameslamb added the feature request label Apr 4, 2024

jameslamb mentioned this issue Apr 26, 2024

[WIP] Add chosen metric argument to clarify early stopping behaviour #6424

Open

jameslamb mentioned this issue Jun 10, 2024

Clarification on Early Stopping Behavior with Multiple eval_set in LightGBM #6475

Closed

StrikerRUS mentioned this issue Sep 5, 2024

FEAT allow metadata to be transformed in a Pipeline scikit-learn/scikit-learn#28901

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Allow using only the last dataset for early stopping #6360

[python-package] Allow using only the last dataset for early stopping #6360

jdawang commented Mar 14, 2024

[python-package] Allow using only the last dataset for early stopping #6360

[python-package] Allow using only the last dataset for early stopping #6360

Comments

jdawang commented Mar 14, 2024

Summary

Motivation

Description