-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not being able to load data that has been previously saved #6052
Comments
Also, I've noticed that loading the data and using it for something is not tested in https://github.com/microsoft/LightGBM/blob/v4.0.0/tests/python_package_test/test_basic.py#L247 |
Saving the binary doesn't save the raw data I believe. In this case the "binary" data refers to the binned data, and the "raw" data refers to the data prior to binning (floats for instance). What if you just don't call |
This returns just the path to the data fwiw |
Running this gets me >>> lgb.train({}, ds2, num_boost_round=3)
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000556 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 98
[LightGBM] [Info] Number of data points in the train set: 150, number of used features: 4
[LightGBM] [Info] Start training from score 1.000000
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
<lightgbm.basic.Booster object at 0x100a32160> I don't think any training happened 🤔 |
Though, if I do it with the original dataset I get the same output: >>> lgb.train({}, ds, num_boost_round=3)
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000464 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 98
[LightGBM] [Info] Number of data points in the train set: 150, number of used features: 4
[LightGBM] [Info] Start training from score 1.000000
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
<lightgbm.basic.Booster object at 0x100a18490> |
I would throw a warning when trying to get data and it is not possible because the Dataset has been loaded from a |
Description
I am trying to load data in a
Dataset
after I have previously saved it, but I cannot get the data itself. I know that the data is being read because the feature names are being loaded, but the data is not.Reproducible example
The following loads some data into a
Dataset
:Now that I have my data saved, I want to load it:
As expected, I cannot
get_data
, so Iconstruct
lazily theDataset
:which is rather unexpected, as I don't know why the raw data was freed, initially. Let us follow the message advise, though:
Note that this is not what I expected, I expected to actually be able to access the data. How can I access it?
Environment info
LightGBM version or commit hash:
v4.0.0
Command(s) you used to install LightGBM
in a virtual environment.
The text was updated successfully, but these errors were encountered: