Document Lance compatibility with python multiprocessing #2405

ipoflex · 2024-05-29T08:35:10Z

When using a pytorch Dataloader with many workers, access to lance dataset will hang forever. This is due to the fact that torch.utils.data.Dataloader use "fork" method for multiprocessing by default. The need of using "spawn" method with lance is mentionned in lancedb FAQ : https://lancedb.github.io/lancedb/faq/#does-lancedb-support-concurrent-operations but nowhere in lance doc, example, guide or repo. There is this issue : #2204 which was send to me by someone of lance team on discord. So as it was aked to me, i'm creating this issue to let you know that this information should maybe be written in docs or guides.
Maybe here : https://github.com/lancedb/lance-deeplearning-recipes as deep-learning usually use multi-gpu.

wjones127 added documentation Improvements or additions to documentation good first issue Good for newcomers labels May 29, 2024

wjones127 changed the title ~~Lance compatibility with python multiprocessing~~ Document Lance compatibility with python multiprocessing May 29, 2024

wjones127 self-assigned this May 31, 2024

wjones127 mentioned this issue Jun 21, 2024

docs(python): note multiprocessing incompatibility #2506

Merged

wjones127 closed this as completed in #2506 Jun 21, 2024

wjones127 closed this as completed in 735f5b2 Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Lance compatibility with python multiprocessing #2405

Document Lance compatibility with python multiprocessing #2405

ipoflex commented May 29, 2024

Document Lance compatibility with python multiprocessing #2405

Document Lance compatibility with python multiprocessing #2405

Comments

ipoflex commented May 29, 2024