Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading model from txt file in Sklearn API #5552

Closed
gverbock opened this issue Oct 20, 2022 · 6 comments
Closed

Reading model from txt file in Sklearn API #5552

gverbock opened this issue Oct 20, 2022 · 6 comments

Comments

@gverbock
Copy link

gverbock commented Oct 20, 2022

Description

Starting from version 3.3.1, the code I use to read a booster from a txt file and include it into a LGBMClassifier is not working anymore. I am wondering if this is a side effect of another change or something make on purpose.

Reproducible example

import pandas as pd
import lightgbm as lgb

from sklearn.datasets import make_classification

X, y = make_classification()
X = pd.DataFrame(X, columns = [f"f_{col}" for col in range(0, 20)])

clf = lgb.LGBMClassifier(n_estimators=20, max_depth=6, random_state=0)
clf.fit(X,y)
clf.booster_.save_model("clf.txt")

# read the classifier
booster = lgb.Booster(model_file="clf.txt")
features - booster.feature_name()

read_clf = lgb.LGBMClassifier()
read_clf._Booster = booster
read_clf._n_classes = 1
read_clf._n_features = len(features)

# Predictions

read_clf.predict_proba(X)[:, 1]

It works with 3.3.0

lgbm330

But not with 3.3.1, 3.3.2 and 3.3.3
image

Additional Comments

  • I know there could be questions on why not use joblib and save (and read) the model as a pickle file but I have good reason for using a txt version.
  • Maybe there is an easier way to do this so I am open for suggestion (except using pickle files).
  • I had to manually copy the code, so apologies if this introduces a bug.
@jmoralez
Copy link
Collaborator

Hi @gverbock, thanks for using LightGBM. Can you share more details about why you need the scikit-learn interface for inference?

@gverbock
Copy link
Author

Sure, the modelling pipeline I am using is based on the scikit-learn interface and I plug-in different types of model depending on the needs and regulatory requirements (logistic regression, random forest, ...). Using the scikit-learn interface allows me to use LightGBM without changing the current pipeline. I noticed a significant improvement when using LightGBM on some parts of the modelling pipeline so I really want to use it. So far my options to use LightGBM are:

  • Keep using the 3.3.0 version. This is a short time solution as it will get outdated in the future.
  • Adjust the modelling pipeline to allow the use of native LightGBM API. This is cumbersome as it also involves changes in IT, tooling, ...
  • Write my own LightGBM wrapper.... well this is already done with LGBMClassifier. I just need the possibility to "read" a model save in a txt format.

@jmoralez
Copy link
Collaborator

jmoralez commented Oct 21, 2022

I see. Assigning those internal attributes may break again in the feature, but if you're that restricted and that has been working for you so far, the change needed for your example to work is setting read_clf.fitted_ = True.

@jameslamb what do you think of this as a feature request? xgboost reference.

@jameslamb
Copy link
Collaborator

Sorry for taking so long to get back to this @jmoralez !

I'd support adding the ability to initialize the scikit-learn classes from a model in text format. Following the API that xgboost set up (thanks for the link!) looks ok to me.

If you or anyone else reading this opens a pull request, I'd also ask that you consider:

  • a classmethod like .from_model_file() or something
  • passing init_model or model_file or something through the constructor (like Booster)

@jameslamb
Copy link
Collaborator

I've added this to #2302, where we track all feature requests.

Given that, I'm going to close this for now. Anyone is welcome to contribute this feature! And it can be re-opened for discussion if there are design decisions to be made before opening a PR or if a contributor needs help with the contribution process.

@gverbock thanks for describing this feature and for using LightGBM! Just to be sure we set the right expectation... this project is suffering from a serious lack of maintainer availability and attention these days. The fastest way to get LightGBM to adopt this behavior is to implement it yourself and propose a pull request. Otherwise, stay subscribed to this and you'll be notified if someone starts working on it.

@github-actions

This comment was marked as outdated.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
@microsoft microsoft unlocked this conversation Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants