You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training an index the shuffling stage runs too slowly. This seems to be because we are storing 1 batch per partition (even empty partitions need a batch) and there is a lot of concatenation. By storing a list array we can avoid both problems and the shuffling can actually completely quite quickly (in theory as fast as 6 minutes for 1B rows but might need many CPUs / RAM to achieve). At the very least we should be able to finish in 1-2 hours.
The text was updated successfully, but these errors were encountered:
When training an index the shuffling stage runs too slowly. This seems to be because we are storing 1 batch per partition (even empty partitions need a batch) and there is a lot of concatenation. By storing a list array we can avoid both problems and the shuffling can actually completely quite quickly (in theory as fast as 6 minutes for 1B rows but might need many CPUs / RAM to achieve). At the very least we should be able to finish in 1-2 hours.
The text was updated successfully, but these errors were encountered: