ngraph : fix for multithreading test_analyzer_image_classification #18265

pawelpiotrowicz · 2019-06-21T23:14:00Z

ngraph : fix for multithreading test_analyzer_image_classification test=develop

kbinias

LGTM

luotao1 · 2019-07-03T02:53:13Z

Could you refine the title and description? I could not see the full command from your title.

LeoZhao-Intel · 2019-07-05T09:58:40Z

@pawelpiotrowicz can you describe your idea or design in comment? It is better to let us understand your general thoughts.

baojun-nervana

LGTM, thanks!

pawelpiotrowicz · 2019-07-23T11:48:31Z

@LeoZhao-Intel The idea behind this solution allows threads to have an internal cache. Only pointer to cache is stored on thread local cache, real cache is located on heap.
The cache is deleted if thread dies – heap is released.

…ds=X test=develop

LeoZhao-Intel · 2019-07-29T06:35:04Z

@LeoZhao-Intel The idea behind this solution allows threads to have an internal cache. Only pointer to cache is stored on thread local cache, real cache is located on heap.
The cache is deleted if thread dies – heap is released.

Got your idea, you want to use thread_local variable to identify thread lifetime.

But be careful on that, we met some issues on mkldnn for some kind of cache reusing for Baidu's online service deployment. In their usages, there is a thread pool which is used for inference execution, but these threads are never exit and they are always reused for different iteration.

baojun-nervana

LGTM

baojun-nervana · 2019-07-30T21:28:50Z

@pawelpiotrowicz Can you add a description as requested to move forward with this PR?

pawelpiotrowicz · 2019-07-31T09:17:38Z

@luotao1 , I changed description as you requested could you carry on with this PR ?

bingyanghuang · 2019-08-01T00:55:24Z

@pawelpiotrowicz Hi pawel , could you refine your description in this way:

What's the problem you are fixing, could you describe it more detailed rather than "the multi-threading problem of n-graph"?
Could you give the command line to reproduce this problem in this PR?
If the issue you want to fix has been created , could you refer the issue number in your description? That will be more friendly for baidu to understand why we have this PR.
After fixing this bug, what's result we will get? e.g. this is the multi-threading problem, will this problem affect the performance ? if so, please give the benchmark before and after fix, if not could you give some words on after fixing this problem, what we can get.

luotao1 · 2019-08-01T02:01:19Z

Thanks for @bingyanghuang 's suggestions, @pawelpiotrowicz Could you tell us how to reproduce the bug and verify whether this PR fix the bug? like #18382 (comment)

pawelpiotrowicz · 2019-08-01T13:39:03Z

@bingyanghuang @luotao1

What's the problem you are fixing, could you describe it more detailed rather than "the multi-threading problem of n-graph"?

It's reffering to test_analyzer_image_classification app with ngraph support.

Could you give the command line to reproduce this problem in this PR?

cmake ..  -DCMAKE_BUILD_TYPE=Debug -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=ON -DWITH_GPU=OFF

paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification --infer_model=/home/pawepiot/workspace/multi_instance_public/paddle-public/build/third_party/inference_demo/googlenet/model --gtest_filter=Analyzer_resnet50.profile --use_ngraph --use_analysis=false --repeat=100 --paddle_num_threads=4 --num_threads=2
Note: Google Test filter = Analyzer_resnet50.profile
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Analyzer_resnet50
[ RUN      ] Analyzer_resnet50.profile
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0801 14:22:55.798418 23398 tester_helper.h:233] feed target 0: {-1, 3, 227, 227}
I0801 14:22:55.799906 23398 tester_helper.h:90] NativeConfig {
  PaddlePredictor::Config {
    model_dir: 
  }
  use_gpu: 0
  device: 0
  fraction_of_gpu_memory: 0
  specify_input_name: 1
}
I0801 14:22:55.838295 23399 tester_helper.h:332] Thread 0, number of threads 2, run 100 times...
I0801 14:22:55.838299 23400 tester_helper.h:332] Thread 1, number of threads 2, run 100 times...
*** Error in /home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification': double free or corruption (out): 0x00007fe4e82fac30 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fe51fbe87e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fe51fbf137a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fe51fbf553c]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZSt8_DestroyINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEvPT_+0x18)[0x428eca4]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZNSt12_Destroy_auxILb0EE9__destroyIPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvT_S9_+0x2e)[0x428c17b]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZSt8_DestroyIPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEvT_S7_+0x23)[0x4287d18]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZSt8_DestroyIPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_EvT_S7_RSaIT0_E+0x27)[0x4282acb]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EED1Ev+0x35)[0x427d2bd]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZN6paddle9operators11EngineCacheD1Ev+0x2c)[0x5496872]
/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification(_ZNSt4pairIKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN6paddle9operators11EngineCacheEED1Ev+0x1c)[0x54a658a./run.sh: line 35: 23398 Bus error               (core dumped) $cmd_fail

If the issue you want to fix has been created , could you refer the issue number in your description? That will be more friendly for baidu to understand why we have this PR.

No issue registerd. It was one of the ngraph-integration task.
After fix

Note: Google Test filter = Analyzer_resnet50.profile
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Analyzer_resnet50
[ RUN      ] Analyzer_resnet50.profile
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0801 14:53:10.581333 16227 tester_helper.h:233] feed target 0: {-1, 3, 227, 227}
I0801 14:53:10.584867 16227 tester_helper.h:90] NativeConfig {
  PaddlePredictor::Config {
    model_dir: 
  }
  use_gpu: 0
  device: 0
  fraction_of_gpu_memory: 0
  specify_input_name: 1
}
I0801 14:53:10.634505 16228 tester_helper.h:332] Thread 0, number of threads 2, run 100 times...
I0801 14:53:10.634531 16229 tester_helper.h:332] Thread 1, number of threads 2, run 100 times...
I0801 14:53:46.711762 16229 helper.h:322] ====== threads: 2, thread id: 1 ======
I0801 14:53:46.714756 16229 helper.h:324] ====== batch size: 1, iterations: 1, repetitions: 100 ======
I0801 14:53:46.714833 16229 helper.h:326] ====== batch latency: 360.769ms, number of samples: 1, sample latency: 360.769ms, fps: 2.77186, data type: float ======
I0801 14:53:47.085276 16228 helper.h:322] ====== threads: 2, thread id: 0 ======
I0801 14:53:47.085427 16228 helper.h:324] ====== batch size: 1, iterations: 1, repetitions: 100 ======
I0801 14:53:47.085489 16228 helper.h:326] ====== batch latency: 364.507ms, number of samples: 1, sample latency: 364.507ms, fps: 2.74343, data type: float ======
[       OK ] Analyzer_resnet50.profile (36573 ms)
[----------] 1 test from Analyzer_resnet50 (36573 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (36573 ms total)
[  PASSED  ] 1 test.

luotao1 · 2019-08-01T14:01:14Z

@zhupengyang Could you help reproduce the bug and verify whether this PR fixes the bug with #18265 (comment)?

zhupengyang · 2019-08-02T12:32:31Z

reproduce the bug: commit id: f745d6d

Note: Google Test filter = Analyzer_resnet50.profile
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Analyzer_resnet50
[ RUN      ] Analyzer_resnet50.profile
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 06:21:59.253562 31349 tester_helper.h:233] feed target 0: {-1, 3, 227, 227}
I0802 06:21:59.255908 31349 tester_helper.h:90] NativeConfig {
  PaddlePredictor::Config {
    model_dir: 
  }
  use_gpu: 0
  device: 0
  fraction_of_gpu_memory: 0
  specify_input_name: 1
}
I0802 06:21:59.314203 31350 tester_helper.h:332] Thread 0, number of threads 2, run 100 times...
I0802 06:21:59.314246 31351 tester_helper.h:332] Thread 1, number of threads 2, run 100 times...
Segmentation fault

verify pr ngraph : fix for multithreading test_analyzer_image_classification #18265:

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Analyzer_resnet50
[ RUN      ] Analyzer_resnet50.profile
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 12:28:25.573624 14799 tester_helper.h:233] feed target 0: {-1, 3, 227, 227}
I0802 12:28:25.575999 14799 tester_helper.h:90] NativeConfig {
  PaddlePredictor::Config {
    model_dir: 
  }
  use_gpu: 0
  device: 0
  fraction_of_gpu_memory: 0
  specify_input_name: 1
}
I0802 12:28:25.650452 14800 tester_helper.h:332] Thread 0, number of threads 2, run 100 times...
I0802 12:28:25.650454 14801 tester_helper.h:332] Thread 1, number of threads 2, run 100 times...
I0802 12:29:10.631986 14801 helper.h:322] ====== threads: 2, thread id: 1 ======
I0802 12:29:10.632396 14801 helper.h:324] ====== batch size: 1, iterations: 1, repetitions: 100 ======
I0802 12:29:10.632429 14801 helper.h:326] ====== batch latency: 449.813ms, number of samples: 1, sample latency: 449.813ms, fps: 2.22315, data type: float ======
I0802 12:29:11.176981 14800 helper.h:322] ====== threads: 2, thread id: 0 ======
I0802 12:29:11.177028 14800 helper.h:324] ====== batch size: 1, iterations: 1, repetitions: 100 ======
I0802 12:29:11.177039 14800 helper.h:326] ====== batch latency: 455.263ms, number of samples: 1, sample latency: 455.263ms, fps: 2.19653, data type: float ======
[       OK ] Analyzer_resnet50.profile (45701 ms)
[----------] 1 test from Analyzer_resnet50 (45701 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (45702 ms total)
[  PASSED  ] 1 test.

@luotao1

luotao1 · 2019-08-02T15:13:10Z

Thanks for @zhupengyang's verify work! @tensor-tang Please take a review!

tensor-tang

LGTM

Thanks @pawelpiotrowicz 's great work and @zhupengyang 's validation

luotao1 · 2019-08-05T06:30:06Z

Please add a unit-test to ensure the multi-threading test_analyzer_image_classification on ngraph in next PR.

…ds=X (PaddlePaddle#18265) test=develop

bingyanghuang · 2019-08-13T06:11:45Z

Please add a unit-test to ensure the multi-threading test_analyzer_image_classification on ngraph in next PR.

@pawelpiotrowicz When do you plan to add this unit test ?

pawelpiotrowicz · 2019-08-13T10:48:54Z

@bingyanghuang , The task is a bit complex - I need more time, firstly I have to see test-coverage output and depends on result take decision.

luotao1 · 2019-08-14T06:28:24Z

The task is a bit complex

You can simply add followings like

Paddle/paddle/fluid/inference/tests/api/analyzer_bert_tester.cc

Lines 191 to 220 in 10eeed9

    
           void compare(bool use_mkldnn = false, bool use_ngraph = false) { 
        
             AnalysisConfig cfg; 
        
             SetConfig(&cfg); 
        
             if (use_mkldnn) { 
        
               cfg.EnableMKLDNN(); 
        
               cfg.pass_builder()->AppendPass("fc_mkldnn_pass"); 
        
             } 
        
             if (use_ngraph) { 
        
               cfg.EnableNgraph(); 
        
             } 
        
             std::vector<std::vector<PaddleTensor>> inputs; 
        
             LoadInputData(&inputs); 
        
             CompareNativeAndAnalysis( 
        
                 reinterpret_cast<const PaddlePredictor::Config *>(&cfg), inputs); 
        
           } 
        
           TEST(Analyzer_bert, compare) { compare(); } 
        
           #ifdef PADDLE_WITH_MKLDNN 
        
           TEST(Analyzer_bert, compare_mkldnn) { 
        
             compare(true, false /* use_mkldnn, no use_ngraph */); 
        
           } 
        
           #endif 
        
           #ifdef PADDLE_WITH_NGRAPH 
        
           TEST(Analyzer_bert, compare_ngraph) { 
        
             compare(false, true /* no use_mkldnn, use_ngraph */); 
        
           } 
        
           #endif

#ifdef PADDLE_WITH_NGRAPH
TEST(Analyzer_bert, profile_ngraph) { profile(false, true); }
#endif

TestPrediction(reinterpret_cast<const PaddlePredictor::Config *>(&config),
                 inputs, &outputs, FLAGS_num_threads /*2 or 4*/);

see test-coverage output

test-coverage is an independent thing with this multi-threading unit-tests. Adding unit-test will not decrease the test-coverage, you can add the unit-test at first.

kbinias requested a review from baojun-nervana June 24, 2019 13:30

pawelpiotrowicz mentioned this pull request Jun 25, 2019

WIP ngraph_engine parallel - fix , test_analyzer_googlenet --num_thread=X #17966

Closed

kbinias added the Intel label Jun 25, 2019

kbinias self-requested a review June 27, 2019 12:49

kbinias previously approved these changes Jun 27, 2019

View reviewed changes

luotao1 closed this Jul 3, 2019

luotao1 reopened this Jul 3, 2019

kbinias mentioned this pull request Jul 4, 2019

[PROPOSAL] Add support for dynamic code analysis (Sanitizers) #18303

Merged

baojun-nervana previously approved these changes Jul 9, 2019

View reviewed changes

fix for multithreading test_analyzer_image_classification --num_threa…

fdc1350

…ds=X test=develop

pawelpiotrowicz dismissed stale reviews from baojun-nervana and kbinias via fdc1350 July 25, 2019 10:20

pawelpiotrowicz force-pushed the pawepiot/ngraph_multithread_tls branch from afc2408 to fdc1350 Compare July 25, 2019 10:20

pawelpiotrowicz changed the title ~~ngraph : fix for multithreading test_analyzer_image_classification --…~~ ngraph : fix for multithreading test_analyzer_image_classification Jul 25, 2019

baojun-nervana added the NGraph label Jul 29, 2019

baojun-nervana approved these changes Jul 30, 2019

View reviewed changes

luotao1 requested a review from tensor-tang August 2, 2019 15:12

tensor-tang approved these changes Aug 5, 2019

View reviewed changes

luotao1 merged commit e53f517 into PaddlePaddle:develop Aug 5, 2019

suoych pushed a commit to suoych/Paddle that referenced this pull request Aug 5, 2019

fix for multithreading test_analyzer_image_classification --num_threa…

f432ea9

…ds=X (PaddlePaddle#18265) test=develop

baojun-nervana deleted the pawepiot/ngraph_multithread_tls branch August 5, 2019 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngraph : fix for multithreading test_analyzer_image_classification #18265

ngraph : fix for multithreading test_analyzer_image_classification #18265

pawelpiotrowicz commented Jun 21, 2019 •

edited

Loading

kbinias left a comment

luotao1 commented Jul 3, 2019

LeoZhao-Intel commented Jul 5, 2019

baojun-nervana left a comment

pawelpiotrowicz commented Jul 23, 2019

LeoZhao-Intel commented Jul 29, 2019

baojun-nervana left a comment

baojun-nervana commented Jul 30, 2019

pawelpiotrowicz commented Jul 31, 2019

bingyanghuang commented Aug 1, 2019

luotao1 commented Aug 1, 2019 •

edited

Loading

pawelpiotrowicz commented Aug 1, 2019 •

edited by luotao1

Loading

luotao1 commented Aug 1, 2019

zhupengyang commented Aug 2, 2019

luotao1 commented Aug 2, 2019

tensor-tang left a comment

luotao1 commented Aug 5, 2019

bingyanghuang commented Aug 13, 2019 •

edited

Loading

pawelpiotrowicz commented Aug 13, 2019

luotao1 commented Aug 14, 2019

ngraph : fix for multithreading test_analyzer_image_classification #18265

ngraph : fix for multithreading test_analyzer_image_classification #18265

Conversation

pawelpiotrowicz commented Jun 21, 2019 • edited Loading

kbinias left a comment

Choose a reason for hiding this comment

luotao1 commented Jul 3, 2019

LeoZhao-Intel commented Jul 5, 2019

baojun-nervana left a comment

Choose a reason for hiding this comment

pawelpiotrowicz commented Jul 23, 2019

LeoZhao-Intel commented Jul 29, 2019

baojun-nervana left a comment

Choose a reason for hiding this comment

baojun-nervana commented Jul 30, 2019

pawelpiotrowicz commented Jul 31, 2019

bingyanghuang commented Aug 1, 2019

luotao1 commented Aug 1, 2019 • edited Loading

pawelpiotrowicz commented Aug 1, 2019 • edited by luotao1 Loading

luotao1 commented Aug 1, 2019

zhupengyang commented Aug 2, 2019

luotao1 commented Aug 2, 2019

tensor-tang left a comment

Choose a reason for hiding this comment

luotao1 commented Aug 5, 2019

bingyanghuang commented Aug 13, 2019 • edited Loading

pawelpiotrowicz commented Aug 13, 2019

luotao1 commented Aug 14, 2019

pawelpiotrowicz commented Jun 21, 2019 •

edited

Loading

luotao1 commented Aug 1, 2019 •

edited

Loading

pawelpiotrowicz commented Aug 1, 2019 •

edited by luotao1

Loading

bingyanghuang commented Aug 13, 2019 •

edited

Loading