You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This is intended to be a lot like #11344, but for parquet, orc, avro, csv, json,... files
Essentially it would be really nice to release the semaphore less by trying to buffer more file data. We can use a similar model to how the shuffle readers are described in #11344. But things get complicated because parquet and orc already have ways of kind of doing this. Not the read ahead, but using multiple threads to read the data. Also it is not that often that we end up needing to pull in two batches of input data. So this is probably lower priority compared to the shuffle. It is also a lot of work if we want to try and match what the parquet and orc readers are doing today. But it should be doable and should give us prioritization and flow control for these as well.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
This is intended to be a lot like #11344, but for parquet, orc, avro, csv, json,... files
This is intended to be a follow on to #1815
Essentially it would be really nice to release the semaphore less by trying to buffer more file data. We can use a similar model to how the shuffle readers are described in #11344. But things get complicated because parquet and orc already have ways of kind of doing this. Not the read ahead, but using multiple threads to read the data. Also it is not that often that we end up needing to pull in two batches of input data. So this is probably lower priority compared to the shuffle. It is also a lot of work if we want to try and match what the parquet and orc readers are doing today. But it should be doable and should give us prioritization and flow control for these as well.
The text was updated successfully, but these errors were encountered: