colexec: dummy GRACE hash joiner #43790

yuzefovich · 2020-01-07T19:37:00Z

Copying description written by Jordan from #24582:

GRACE hash join is a slightly more sophisticated hash join algorithm, usable for equality joins only, that will give us better runtime properties when hash join needs to spill to disk. It operates in two phases, once the join runs out of memory and must spill to disk. The high level view is that it partitions the left and right side into large buckets by a hash function A, writes those buckets to disk, then iterates through pairs of those buckets and does a normal hash join with a different hash function B.

Phase 1: partitioning
In this phase, we iterate through both sides of the join, hashing every row using a hash function A that produces n partitions. This will produce n partitions for each side of the join, which will be persisted to disk separately. As memory fills up, each of these partitions is flushed to disk repeatedly until the inputs are exhausted.

Phase 2: join
Now, we retrieve pairs of partitions from disk and join each pair using the ordinary hash join algorithm (and a different hash function B). Since we're performing an equality join, we can guarantee that each row on the left side of the join, if it has a match, will be in the same partition on the right side of the join. So, it's safe to do the join in pieces, partition by partition.

If one of the partition hash joins itself runs out of memory, we can recursively apply this algorithm. The partition will be divided into sub-partitions by a new hash function, spilled to disk, and so on.

Because of the danger of having to recursively spill to disk, it's important to pick a hash function that partitions the input streams into reasonable sizes. I think this is probably only possible given table statistics because you need to know the approximate size of the table to determine how many partitions you need of the table such that each partition fits into the available SQL memory.

The text was updated successfully, but these errors were encountered:

yuzefovich self-assigned this Jan 7, 2020

yuzefovich changed the title ~~colexec: dummy GRACE hash joiner (yahor)~~ colexec: dummy GRACE hash joiner Jan 9, 2020

yuzefovich mentioned this issue Jan 30, 2020

colexec: add external hash joiner #44523

Merged

craig bot closed this as completed in b798c6a Feb 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: dummy GRACE hash joiner #43790

colexec: dummy GRACE hash joiner #43790

yuzefovich commented Jan 7, 2020 •

edited

Loading

colexec: dummy GRACE hash joiner #43790

colexec: dummy GRACE hash joiner #43790

Comments

yuzefovich commented Jan 7, 2020 • edited Loading

yuzefovich commented Jan 7, 2020 •

edited

Loading