[BUG] left_semi_join operation is abnormal and serious time-consuming #569

chenrui17 · 2020-08-15T04:39:49Z

Describe the bug
I test tpc-ds query-10 with rapids-0.2 recently committed , and I find there is a phenomenon that during the reduce process after shuffle, several takes are time-consuming, and I use nsys found that time spent on left_semi_join , i suspect this is a bug
Normally, this query can be completed in dozens of seconds, but now the several takes 5-8 minutes. At the same time, it is found that the GPU utilization rate is very high in this process

chenrui17 · 2020-08-17T13:18:26Z

I test with local mode , it is likely GpuCoalesceBatches execution exception firstly.

jlowe · 2020-08-17T14:06:00Z

This is definitely a bug. No GPU operation should take thousands of seconds as shown in the nsys trace.

@chenrui17 does this happen every time you run this query or only sometimes? Does the scale factor of the tpc-ds data matter or will this occur even with smaller scale factors?

chenrui17 · 2020-08-17T14:12:20Z

This is definitely a bug. No GPU operation should take thousands of seconds as shown in the nsys trace.

@chenrui17 does this happen every time you run this query or only sometimes? Does the scale factor of the tpc-ds data matter or will this occur even with smaller scale factors?

it happens every time and my scale factor of the tpc-ds is 1000
I will make the scalefactor smaller and try again.

chenrui17 · 2020-08-17T15:25:30Z

This is definitely a bug. No GPU operation should take thousands of seconds as shown in the nsys trace.

@chenrui17 does this happen every time you run this query or only sometimes? Does the scale factor of the tpc-ds data matter or will this occur even with smaller scale factors?

I also tried two scalfactors. When the scalefactor is 100, tpc-ds query 10 seems to be able to execute normally. It takes a little time to reduce. The total query time is 14s, and the reduce phase takes 8s. However, most tasks take 1s, and some tasks still take 7S, as shown in the figure below;

When scalefactor is 10000, tpc-ds query 10 fails to execute successfully, and it is stuck in the reduce phase mentioned above.

Therefore, even with a small amount of data, this problem still exists

revans2 · 2020-08-17T20:37:46Z

I think I have been able to reproduce this issue. I ran query 10 with a SF 200 data set encoded in parquet.

spark.sql.shuffle.partitions=1
spark.sql.files.maxPartitionBytes=1g
spark.rapids.memory.pinnedPool.size=16g
spark.rapids.sql.concurrentGpuTasks=2

8 CPU cores per GPU.

With the plugin it can take over 50 seconds to run the semi-join. If I disable just LeftSemi as a join option the query takes 6 seconds. If I run on the CPU for everything it completes in about 15 seconds.

It is producing the correct result in all cases, so it looks to be a performance related issue of some kind.

It actually looks like it is scaling related. When I increase the number of workers I see a very large decrease in the total runtime. It is almost like the join being done is brute force some how. I'll dig into the source for it and see if I can come up with something.

revans2 · 2020-08-17T21:08:40Z

So I read through the code and got a profile too. It looks like building the hash table, just for the right table keys is taking about 87% of the time, and then probing the table is taking the other 13% of the time. I really don't know why. I'll try to get a repro case and file something in cudf about this.

@jlowe this looks to be bad enough that I think we want to disable both semi and anti joins (They share the same implementation), but I would like to hear your opinion on this.

jlowe · 2020-08-17T21:57:08Z

this looks to be bad enough that I think we want to disable both semi and anti joins

I think we need to get a bit more details on how it's slow before disabling outright. IIUC semi/anti join should always be as fast or faster than a normal join in cudf. The build phase constructs a hash table just like a regular join, but instead of a multimap it's just a map. The lookup phase should also be as fast or faster since it only cares about a key hit in the table and doesn't care about any value associated with it.

It would be interesting to know if a regular join on the same input tables and same join keys is significantly faster.

JustPlay · 2020-08-18T04:37:03Z

If I disable just LeftSemi as a join option the query takes 6 seconds

So，how can i disable the semi (and anti）join？@revans2

revans2 · 2020-08-18T14:17:15Z

@JustPlay I had to go in and change the code to disable just those joins. We don't have a config for that yet.

@jlowe I was able to make some progress on this. I dumped the inputs to the join and was able to reproduce the issue in c++. It has to do with null values and null equality.

TEST_F(JoinTest, LeftSemiJoinPerf)
{
  cudf::io::read_parquet_args left_args{cudf::io::source_info("/tmp/input/left_0")};
  cudf::io::table_with_metadata left = cudf::io::read_parquet(left_args);

  std::cerr << "LEFT: " << left.tbl->num_columns() << " X " << left.tbl->num_rows() << std::endl;

  cudf::io::read_parquet_args right_args{cudf::io::source_info("/tmp/input/right_0")};
  cudf::io::table_with_metadata right = cudf::io::read_parquet(right_args);

  std::cerr << "RIGHT: " << right.tbl->num_columns() << " X " << right.tbl->num_rows() << std::endl;

  auto join_table = cudf::left_semi_join(*left.tbl, *right.tbl, {0}, {0}, {0, 1, 2, 3},
          cudf::null_equality::UNEQUAL);
  
  std::cerr << "RESULT: " << join_table->num_columns() << " X " << join_table->num_rows() << std::endl;
}

On the 200GB data set when you get to the left semi join there are 513,298 null key values in the right table. If I say that nulls are equal cudf::null_equality::EQUAL, which is the default, then it takes about 63ms to read in both the left and the right table and do the join. If I say that they are not equal, which is what spark needs, then it takes about 50 seconds to do the same thing.

When I do the join manually in Spark I am not able to reproduce it because Spark automatically filters all nulls from both the left and right table before doing the join. My guess as to why it is taking so long is that cudf sees nulls as not equal so each time it sees one it inserts it into the table as a new entry, but because they hash to the same thing it has to walk the table looking for a free location. This then messes up the hash table enough that probing it takes a lot longer too.

The fastest fix for this would be to filter out any nulls before we do the join. We should also file a follow on issue with cudf to see if there is a better fix on their side, but I am off today so if this can wait until Thursday then I can work on it. If not feel free to put in a fix yourself and and we can work with cudf on this later.

jlowe · 2020-08-18T15:18:42Z

@revans2 Thursday should be fine. I'm chasing some other things today, and you're already setup to reproduce the issue easily and therefore can verify the fix.

The fastest fix for this would be to filter out any nulls before we do the join.

Agreed. I assume we'd only need to pre-filter the nulls in the build table for the hash table, thinking it would be faster for the streamed table rows to fail a hit in the hash table, that now no longer contains null entries, than it would be to manifest the non-null version of the streamed table prior to the join.

JustPlay · 2020-08-20T07:33:54Z

at least q5，q10, q35, q69 in tpc-ds has this problem

revans2 · 2020-09-11T20:59:23Z

Reopening because the fix broke correctness in some cases and had to be reverted for the 0.2 release. I'll try to get it fixed properly, but 0.2 is too close to try and do it in that time frame.

…IDIA#569) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

chenrui17 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 15, 2020

chenrui17 changed the title ~~left_semi_join operation isAbnormal operation and serious time-consuming[BUG]~~ left_semi_join operation is abnormal and serious time-consuming[BUG] Aug 15, 2020

jlowe added P0 Must have for release performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Aug 17, 2020

jlowe changed the title ~~left_semi_join operation is abnormal and serious time-consuming[BUG]~~ [BUG] left_semi_join operation is abnormal and serious time-consuming Aug 18, 2020

sameerz assigned revans2 Aug 18, 2020

sameerz added this to the Aug 17 - Aug 28 milestone Aug 18, 2020

This was referenced Aug 20, 2020

[BUG] null keys in a join with null not equal causes performance problems rapidsai/cudf#6052

Closed

Filter nulls from joins where possible to improve performance. #594

Merged

jlowe closed this as completed in #594 Aug 20, 2020

revans2 reopened this Sep 11, 2020

revans2 mentioned this issue Sep 12, 2020

Filter nulls from joins where possible to improve performance #754

Merged

revans2 closed this as completed in #754 Sep 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] left_semi_join operation is abnormal and serious time-consuming #569

[BUG] left_semi_join operation is abnormal and serious time-consuming #569

chenrui17 commented Aug 15, 2020 •

edited

Loading

chenrui17 commented Aug 17, 2020

jlowe commented Aug 17, 2020

chenrui17 commented Aug 17, 2020

chenrui17 commented Aug 17, 2020

revans2 commented Aug 17, 2020

revans2 commented Aug 17, 2020

jlowe commented Aug 17, 2020

JustPlay commented Aug 18, 2020

revans2 commented Aug 18, 2020

jlowe commented Aug 18, 2020

JustPlay commented Aug 20, 2020

revans2 commented Sep 11, 2020

[BUG] left_semi_join operation is abnormal and serious time-consuming #569

[BUG] left_semi_join operation is abnormal and serious time-consuming #569

Comments

chenrui17 commented Aug 15, 2020 • edited Loading

chenrui17 commented Aug 17, 2020

jlowe commented Aug 17, 2020

chenrui17 commented Aug 17, 2020

chenrui17 commented Aug 17, 2020

revans2 commented Aug 17, 2020

revans2 commented Aug 17, 2020

jlowe commented Aug 17, 2020

JustPlay commented Aug 18, 2020

revans2 commented Aug 18, 2020

jlowe commented Aug 18, 2020

JustPlay commented Aug 20, 2020

revans2 commented Sep 11, 2020

chenrui17 commented Aug 15, 2020 •

edited

Loading