[BUG] Explicit-comms shuffle does not obey "_partitions"
column
#1239
Labels
bug
Something isn't working
"_partitions"
column
#1239
While debugging a data-curation workflow, I discovered that the explicit-comms shuffle has a subtle bug in the logic used to assign data to the final partitions when
"_partitions"
is specified. For example:Since all rows should be
0
in the "_partitions" columns, then all data should be moved to partition0
after the shuffle. However, I get an emptyDataFrame
when I execute this:As far as I can tell, this problem is caused by the fact that
shuffle_result[rank]
is not in the same order asrank_to_out_part_ids[rank]
in thisclient.submit
loop (the order is reversed).The text was updated successfully, but these errors were encountered: