Reading model from txt file in Sklearn API #5552

gverbock · 2022-10-20T13:25:45Z

Description

Starting from version 3.3.1, the code I use to read a booster from a txt file and include it into a LGBMClassifier is not working anymore. I am wondering if this is a side effect of another change or something make on purpose.

Reproducible example

import pandas as pd
import lightgbm as lgb

from sklearn.datasets import make_classification

X, y = make_classification()
X = pd.DataFrame(X, columns = [f"f_{col}" for col in range(0, 20)])

clf = lgb.LGBMClassifier(n_estimators=20, max_depth=6, random_state=0)
clf.fit(X,y)
clf.booster_.save_model("clf.txt")

# read the classifier
booster = lgb.Booster(model_file="clf.txt")
features - booster.feature_name()

read_clf = lgb.LGBMClassifier()
read_clf._Booster = booster
read_clf._n_classes = 1
read_clf._n_features = len(features)

# Predictions

read_clf.predict_proba(X)[:, 1]

It works with 3.3.0

But not with 3.3.1, 3.3.2 and 3.3.3

Additional Comments

I know there could be questions on why not use joblib and save (and read) the model as a pickle file but I have good reason for using a txt version.
Maybe there is an easier way to do this so I am open for suggestion (except using pickle files).
I had to manually copy the code, so apologies if this introduces a bug.

jmoralez · 2022-10-20T16:40:27Z

Hi @gverbock, thanks for using LightGBM. Can you share more details about why you need the scikit-learn interface for inference?

gverbock · 2022-10-21T07:29:45Z

Sure, the modelling pipeline I am using is based on the scikit-learn interface and I plug-in different types of model depending on the needs and regulatory requirements (logistic regression, random forest, ...). Using the scikit-learn interface allows me to use LightGBM without changing the current pipeline. I noticed a significant improvement when using LightGBM on some parts of the modelling pipeline so I really want to use it. So far my options to use LightGBM are:

Keep using the 3.3.0 version. This is a short time solution as it will get outdated in the future.
Adjust the modelling pipeline to allow the use of native LightGBM API. This is cumbersome as it also involves changes in IT, tooling, ...
Write my own LightGBM wrapper.... well this is already done with LGBMClassifier. I just need the possibility to "read" a model save in a txt format.

jmoralez · 2022-10-21T14:26:24Z

I see. Assigning those internal attributes may break again in the feature, but if you're that restricted and that has been working for you so far, the change needed for your example to work is setting read_clf.fitted_ = True.

@jameslamb what do you think of this as a feature request? xgboost reference.

jameslamb · 2022-12-24T05:20:53Z

Sorry for taking so long to get back to this @jmoralez !

I'd support adding the ability to initialize the scikit-learn classes from a model in text format. Following the API that xgboost set up (thanks for the link!) looks ok to me.

If you or anyone else reading this opens a pull request, I'd also ask that you consider:

a classmethod like .from_model_file() or something
passing init_model or model_file or something through the constructor (like Booster)

jameslamb · 2022-12-24T05:27:23Z

I've added this to #2302, where we track all feature requests.

Given that, I'm going to close this for now. Anyone is welcome to contribute this feature! And it can be re-opened for discussion if there are design decisions to be made before opening a PR or if a contributor needs help with the contribution process.

@gverbock thanks for describing this feature and for using LightGBM! Just to be sure we set the right expectation... this project is suffering from a serious lack of maintainer availability and attention these days. The fastest way to get LightGBM to adopt this behavior is to implement it yourself and propose a pull request. Otherwise, stay subscribed to this and you'll be notified if someone starts working on it.

jameslamb added the question label Oct 20, 2022

jameslamb added the feature request label Dec 24, 2022

jameslamb closed this as completed Dec 24, 2022

This comment was marked as outdated.

Sign in to view

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

microsoft unlocked this conversation Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading model from txt file in Sklearn API #5552

Reading model from txt file in Sklearn API #5552

gverbock commented Oct 20, 2022 •

edited

Loading

jmoralez commented Oct 20, 2022

gverbock commented Oct 21, 2022

jmoralez commented Oct 21, 2022 •

edited

Loading

jameslamb commented Dec 24, 2022

jameslamb commented Dec 24, 2022

This comment was marked as outdated.

Reading model from txt file in Sklearn API #5552

Reading model from txt file in Sklearn API #5552

Comments

gverbock commented Oct 20, 2022 • edited Loading

Description

Reproducible example

Additional Comments

jmoralez commented Oct 20, 2022

gverbock commented Oct 21, 2022

jmoralez commented Oct 21, 2022 • edited Loading

jameslamb commented Dec 24, 2022

jameslamb commented Dec 24, 2022

This comment was marked as outdated.

gverbock commented Oct 20, 2022 •

edited

Loading

jmoralez commented Oct 21, 2022 •

edited

Loading