[CLI, c/c++] Question about save_binary, train and improving performance for huge dataset #6190

wil70 · 2023-11-14T02:13:03Z

Hello

I have a few questions about save_binary from "task = save_binary"
I have huge csv files. It takes days to convert them to bin files, I would love to speed this by > 10

Any idea how I can speed this process?
Would the cli-socket speed up save_binary?
Can GPU help?
Can multi machines cpus (via something like cli-socket) and each with several GPUs work together to speed up save_binary and train?
Is there a way to add an extra columns to an existing bin file from save_bin?

Thanks for your help!

Wil

Briefly explain your feature proposal.

Speed up save_binary and train for huge files (TB data). It works as of today but it takes days/weeks.

Why is it useful to have this feature in the LightGBM project?

Many problem have huge dataset even after reduction techniques.

Detailed description of the new feature.

Being able to handle huge data set faster than today with the CLI and c/c++ API

Environment information

I'm using windows 10 and windows server with the latest lightgbm code (cli and c/c++ to c#).

Thanks

Wil

ref: https://lightgbm.readthedocs.io/en/latest/Features.html#optimization-in-distributed-learning

jameslamb added the question label Nov 14, 2023

This comment was marked as resolved.

Sign in to view

jameslamb mentioned this issue Dec 15, 2023

[Windows, Cpp] LNK2001 - unresolved external symbol OMP_NUM_THREADS #6238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLI, c/c++] Question about save_binary, train and improving performance for huge dataset #6190

[CLI, c/c++] Question about save_binary, train and improving performance for huge dataset #6190

wil70 commented Nov 14, 2023

This comment was marked as resolved.

[CLI, c/c++] Question about save_binary, train and improving performance for huge dataset #6190

[CLI, c/c++] Question about save_binary, train and improving performance for huge dataset #6190

Comments

wil70 commented Nov 14, 2023

Briefly explain your feature proposal.

Why is it useful to have this feature in the LightGBM project?

Detailed description of the new feature.

Environment information

This comment was marked as resolved.