Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Allow using only the last dataset for early stopping #6360

Open
jdawang opened this issue Mar 14, 2024 · 0 comments
Open

[python-package] Allow using only the last dataset for early stopping #6360

jdawang opened this issue Mar 14, 2024 · 0 comments

Comments

@jdawang
Copy link

jdawang commented Mar 14, 2024

Summary

I would like to add a parameter, last_dataset_only or similar naming, to lightgbm.callback.early_stopping that would set early stopping to use the last item of eval_set only.

Motivation

There are situations where it's desirable to have multiple evaluation sets. Sometimes we want to record the evaluation results at each iteration for multiple datasets, but only use one for early stopping. The way XGBoost deals with this is by using only the last item of the eval_set to determine early stopping. We could score the model at each iteration to recreate the evaluation history, but this is inefficient.

This is also important for us when we are developing tools or pipelines that we want to be compatible with both LightGBM and XGBoost, like implementing feature selection or model selection algorithms/utilities that we want to be able to work with both.

Description

In the early stopping callback, LightGBM will use all datasets provided for early stopping. This would add a parameter, last_dataset_only or similar naming, to lightgbm.callback.early_stoppingthat would set early stopping to use the last item ofeval_set` only to determine when to early stop.

I would like the following to create an early stopping callback that would use only the first metric from the last dataset in eval_set to early stop, but would still score on every dataset in eval_set:

from lightgbm.callback import early_stopping

es_cb = early_stopping(5, first_metric_only=True, last_dataset_only=True)

I'm not super familiar with the LGBM codebase and what, if anything, would need to be changed in the codebase besides the early stopping callback, but for what it's worth, I have a working version of a modified early stopping callback that I'm happy to work with you to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants