Custom loss with dependent samples #6145

voilasept · 2023-10-14T04:29:50Z

All the examples I've seen are custom loss function where samples are independent, such as MSE. Sometimes the custom loss has dependency between samples.
The dependency is in batches. Say, we have 1 year of data. For each day, the samples have dependency, but not across different days.

The loss for each day looks like this:
Loss(a day) = (y1-y0)^2 + (y2-y1)^2 + ...
Total loss:
Loss = Loss(day 0) + Loss(day 1) + ...

Is this implementable under current python API? Thank you!

jameslamb · 2023-10-15T16:59:56Z

Thanks for using LightGBM.

Nothing about the way the Python package supports custom objective functions assumes independence between observations in the training data, or require you to implement pointwise loss.

As described at https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html#lightgbm.train, using lightgbm.train(), it's possible to provide a Python function which accepts the predictions as of the current iteration and the full training dataset, and which returns the gradient and hessian for the loss.

Since that function can access the entire training Dataset, there's nothing stopping you from implementing a loss function which considers multiple observations in the training data. It's just up to you to figure out how to do that in a way that the gradients are informative for the purpose of training the model.

Here's an example of a custom loss function using LightGBM's Python package:

LightGBM/examples/python-guide/advanced_example.py

Lines 139 to 147 in 8ed371c

    
           # self-defined objective function 
        
           # f(preds: array, train_data: Dataset) -> grad: array, hess: array 
        
           # log likelihood loss 
        
           def loglikelihood(preds, train_data): 
        
               labels = train_data.get_label() 
        
               preds = 1. / (1. + np.exp(-preds)) 
        
               grad = preds - labels 
        
               hess = preds * (1. - preds) 
        
               return grad, hess

github-actions · 2023-11-15T04:03:26Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

jameslamb added the question label Oct 15, 2023

jameslamb added the awaiting response label Oct 15, 2023

github-actions bot closed this as completed Nov 15, 2023

jameslamb mentioned this issue May 2, 2024

[RFC] provide Python/R implementations of all the built-in objectives? #6440

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom loss with dependent samples #6145

Custom loss with dependent samples #6145

voilasept commented Oct 14, 2023

jameslamb commented Oct 15, 2023

github-actions bot commented Nov 15, 2023

Custom loss with dependent samples #6145

Custom loss with dependent samples #6145

Comments

voilasept commented Oct 14, 2023

jameslamb commented Oct 15, 2023

github-actions bot commented Nov 15, 2023