Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend GPU memory to run cuDF for medium and big data #97

Closed
jangorecki opened this issue Aug 21, 2019 · 9 comments · Fixed by #219
Closed

extend GPU memory to run cuDF for medium and big data #97

jangorecki opened this issue Aug 21, 2019 · 9 comments · Fixed by #219

Comments

@jangorecki
Copy link
Contributor

Even if possible to fix #94 without the need to extend GPU memory we still need more GPU memory for handling 1e9 data (45 GB csv).
We need more gpu cards, better gpu cards or a better machine in general.
For now we have cuDF results only for 1e7 data.

@datametrician
Copy link

I would recommend 2x RTX 8000s. In addition, dask-cuDF would allow you to use both of them vs just using a single 1080 ti as you are doing now.

@jangorecki
Copy link
Contributor Author

Before trying to move to new hardware I would like to resolve #94, so I can be sure that present, and later new hardware, are properly utilized.

@jangorecki
Copy link
Contributor Author

jangorecki commented Dec 11, 2019

Assuming required memory scales linearly to data size (and it looks it does), then 2x RTX 8000s will not allow us to compute 1e9 groupby task, as we would need around 220GB for that. But then even a single RTX 8000s will allow us to compute 1e8 groupby, so we wouldn't need to use dask-cudf. As of now using dask-cudf to might not even help to resolve 1e8 using current 2 gpus, as explained in #94 (comment)

Join task is another thing that we should not forget about, it is more memory demanding so eventually 2x RTX 8000s might be useful to compute 1e8 join.

@datametrician
Copy link

I highly recommend moving to RTX 8000 regardless, but Dask-cuDF (as I said in the other issue) allows spilling to system memory.

@jangorecki
Copy link
Contributor Author

Running medium data size was resolved by spilling data from gpu memory to main memory. Yet it was not enough for a big data case (50 GB), thus I filled new FR for spilling data from main memory to disk memory: rapidsai/cudf#3740
Ultimately we should upgrade GPU cards thus leaving this issue open.
Additionally moving to dask-cudf is still on roadmap, for now postponed till cudf documentation will be improved, status of that can be tracked in #116

@jangorecki
Copy link
Contributor Author

Unfortunately we need to fall back to running only 1e7 data size till rapidsai/cudf#2277 will get resolved. This is because of problem with corrupting GPU memory driver described in #129 which currently makes us unable to run cudf benchmarks. Due to that cuDF timings are already 1.5 months old.

@jangorecki
Copy link
Contributor Author

I re-requested spilling to disk, this time using dask_cudf in rapidsai/cudf#3740

@jangorecki
Copy link
Contributor Author

rapidsai/dask-cuda#37

@jangorecki
Copy link
Contributor Author

resolved by #219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants