Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] refine pandas support #960

Closed
6 tasks
wxchan opened this issue Oct 2, 2017 · 8 comments
Closed
6 tasks

[python] refine pandas support #960

wxchan opened this issue Oct 2, 2017 · 8 comments

Comments

@wxchan
Copy link
Contributor

wxchan commented Oct 2, 2017

to do list:

@StrikerRUS
Copy link
Collaborator

I think, we can explicitly invoke pandas features (e.g. column names, categorical columns, etc.) at sklearn wrapper level and then pass data as numpy array. In this case all sklearn checks will be performed for pandas too.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 3, 2017

@StrikerRUS I am not sure I get it. The pandas support should be independent to sklearn support. You can take a try if you have a good idea.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Oct 4, 2017

@wxchan I'm not sure it's "good idea"...
I want to say, that pandas support should be implemented "more deeper" than sklearn wrapper. But all data checks in sklearn wrapper cannot handle pandas (now pandas just passed as is without any checks).
https://github.com/Microsoft/LightGBM/blob/master/python-package/lightgbm/sklearn.py#L415

So, I offer to extract all pandas features (column names, categorical columns, etc.) before sklearn checks and it will allow to perform these checks without loosing pandas features.

What do you think?

@wxchan
Copy link
Contributor Author

wxchan commented Oct 4, 2017

@StrikerRUS I edit some tasks I can think of. You can add more you think it's necessary.
I think it's better that each of them be implemented separately, not like now one function doing multiple things.

@StrikerRUS
Copy link
Collaborator

@wxchan Yeah, great idea about small separate functions!

@StrikerRUS
Copy link
Collaborator

I haven't taken a detailed look at this yet, but it could be useful.

@muleina
Copy link

muleina commented Mar 7, 2019

Why still receiving a warning from lightgbm ver 2.2.2 with python 3.6.6 in ubuntu.

lightgbm/basic.py:752: UserWarning: categorical_feature in param dict is overridden. warnings.warn('categorical_feature in param dict is overridden.')

Whether category dtypes were set in pandas dataframe or as a list of categorical_feature in lightgbm Dataset or training.
image

Why for the warning after the fix of 792?

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants