Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds #358

Closed
haojinIntel opened this issue Jun 8, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@haojinIntel
Copy link
Collaborator

The executing time of each rounds is showed below:

round 1st/s 2nd/s 3rd/s 4th/s 5th/s 6th/s 7th/s 8th/s 9th/s 10th/s
total time 4694 4154 4118 4105 4139 4174 4192 4352 5802 14344

And the performance data can be reproduced in my cluster.

@haojinIntel
Copy link
Collaborator Author

@weiting-chen @zhouyuan @zhixingheyi-tian Please help to track the issue. Thanks!

@weiting-chen
Copy link
Collaborator

Per discussion, the root cause may be from some memory leak issue in native SQL processing and let java memory get smaller in the 9th and 10th rounds.
Since the result shows pass for all rounds in v1.1.1 testing, we will pass the stability testing for v1.1.1 and address the issue to v1.2.
We will re-run the testing and add more criteria in v1.2 since v1.2 may include some changes for memory allocation.

@weiting-chen weiting-chen added the bug Something isn't working label Jun 9, 2021
@zhixingheyi-tian
Copy link
Collaborator

zhixingheyi-tian commented Aug 17, 2021

@weiting-chen @zhouyuan

Recently, @haojinIntel encounter this issue again.
Do you want to fix it in OAP 1.2.0 release?

Thanks

@zhouyuan
Copy link
Collaborator

based on my knowledge the issue is most likely due to two issues:

  • JVM GC introduced pause
    As native sql still use some heap space(shuffle reading) and we restrict the heap to some small size to leave more space for native operators, applying GC related turning may help. here's the GC options i'm using
    spark.executor.extraJavaOptions -XX:+UseParallelOldGC -XX:ParallelGCThreads=5 -XX:NewRatio=1 -XX:SurvivorRatio=1 -XX:+UseCompressedOops -verbose:gc

  • memory fragmentation
    Most of the memory allocations are through arrow allocator(jemalloc), there are still several small allocations in glibc - as the run goes on glibc may have some trouble on the memory fragmentation. Should be able to workaround this by LD_PRELOAD libjemalloc.so currenlty. we have some plans to support built-in jemalloc. Support pre-built Jemalloc #433

@haojinIntel
Copy link
Collaborator Author

Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants