Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

cloud-rocket · 2020-11-29T04:57:10Z

My own setup runs at 8hz on Jetson Nano in drive mode and drops to 3hz in local_angle mode.
- The difference is even bigger when disabling OLED part (20hz in drive mode vs 3.5zh in local_angle mode)
Non of the CPU cores (4 on Nano) was fully loaded during the test
Algorithm performance is probably suffers, because training and runtime is done on different frequency
Performance was monitored via the following code - Add performance monitor #689

This is my setup:

+--------------------+--------+--------+--------+--------+--------+--------+--------+
|        part        |  max   |  min   |  avg   |  50%   |  90%   |  99%   | 99.9%  |
+--------------------+--------+--------+--------+--------+--------+--------+--------+
|     CSICamera      |  0.09  |  0.01  |  0.02  |  0.01  |  0.02  |  0.05  |  0.09  |
| MySerialController |  0.09  |  0.02  |  0.02  |  0.02  |  0.03  |  0.06  |  0.08  |
|   ThrottleFilter   |  0.10  |  0.01  |  0.02  |  0.01  |  0.02  |  0.05  |  0.10  |
|   PilotCondition   |  0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.04  |  0.07  |
|   RecordTracker    |  0.07  |  0.01  |  0.01  |  0.01  |  0.02  |  0.03  |  0.06  |
|        IMU         |  0.07  |  0.02  |  0.02  |  0.02  |  0.03  |  0.06  |  0.07  |
|       WebFpv       |  0.04  |  0.01  |  0.02  |  0.02  |  0.03  |  0.03  |  0.04  |
|    PerfMonitor     | 13.45  |  0.92  |  2.35  |  1.33  |  5.27  | 11.13  | 13.20  |
|     DriveMode      |  0.08  |  0.02  |  0.03  |  0.03  |  0.04  |  0.07  |  0.08  |
|      AiLaunch      |  0.04  |  0.02  |  0.02  |  0.02  |  0.02  |  0.04  |  0.04  |
|   AiRunCondition   |  0.05  |  0.01  |  0.02  |  0.01  |  0.02  |  0.02  |  0.05  |
|    PWMSteering     |  0.08  |  0.02  |  0.03  |  0.03  |  0.04  |  0.05  |  0.07  |
|    PWMThrottle     |  0.08  |  0.02  |  0.02  |  0.02  |  0.02  |  0.07  |  0.08  |
|      OLEDPart      | 155.06 | 125.55 | 136.43 | 137.56 | 142.47 | 147.86 | 154.13 |
|     Telemetry      | 15.27  |  0.11  |  0.91  |  0.12  |  4.99  | 12.03  | 14.92  |
+--------------------+--------+--------+--------+--------+--------+--------+--------+

The text was updated successfully, but these errors were encountered:

sctse999 · 2020-11-30T02:56:06Z

It looks like OLEDPart is taking a lot of time to finish. Can you turn off those time consuming parts and verify the result?

cloud-rocket · 2020-12-01T02:19:29Z

@sctse999 - I disable oled and reduced the overhead of Telemetry and PerfMonitor. I am still getting around 4hz with active model:

+--------------------+---------+--------+--------+--------+--------+--------+---------+
|        part        |   max   |  min   |  avg   |  50%   |  90%   |  99%   |  99.9%  |
+--------------------+---------+--------+--------+--------+--------+--------+---------+
|     CSICamera      |   0.24  |  0.01  |  0.02  |  0.02  |  0.04  |  0.05  |   0.15  |
| MySerialController |   0.10  |  0.02  |  0.03  |  0.02  |  0.03  |  0.04  |   0.09  |
|   ThrottleFilter   |   0.10  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
|   PilotCondition   |   0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.02  |   0.07  |
|   RecordTracker    |   0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.02  |   0.07  |
|        IMU         |   2.59  |  0.02  |  0.03  |  0.03  |  0.03  |  0.05  |   0.18  |
|       WebFpv       |   0.84  |  0.01  |  0.02  |  0.02  |  0.03  |  0.03  |   0.05  |
|    PerfMonitor     |   0.24  |  0.02  |  0.03  |  0.03  |  0.04  |  0.06  |   0.11  |
|    FileWatcher     |  18.39  |  0.05  |  0.25  |  0.16  |  0.29  |  3.76  |   7.49  |
|    FileWatcher     |   9.76  |  0.03  |  0.17  |  0.08  |  0.20  |  3.24  |   6.40  |
|   DelayedTrigger   |   0.09  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
| TriggeredCallback  |   0.09  |  0.01  |  0.01  |  0.01  |  0.01  |  0.02  |   0.09  |
|    KerasLinear     | 1515.96 | 289.63 | 329.83 | 321.82 | 344.91 | 386.60 | 1115.74 |
|     DriveMode      |   0.24  |  0.02  |  0.04  |  0.04  |  0.10  |  0.13  |   0.24  |
|      AiLaunch      |   0.08  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
|   AiRunCondition   |   0.09  |  0.01  |  0.01  |  0.01  |  0.02  |  0.03  |   0.07  |
|    PWMSteering     |   0.60  |  0.02  |  0.07  |  0.07  |  0.11  |  0.16  |   0.22  |
|    PWMThrottle     |   0.17  |  0.02  |  0.03  |  0.03  |  0.04  |  0.05  |   0.12  |
|   MqttTelemetry    |   0.94  |  0.09  |  0.13  |  0.13  |  0.14  |  0.19  |   0.38  |
+--------------------+---------+--------+--------+--------+--------+--------+---------+

tikurahul · 2020-12-01T05:00:46Z

You should try using TensorRT on the Nano or TFLite. Your framerate will improve considerably. I usually get 45Hz with my model.
A p90 of 344.91ms is actually vey poor.

DocGarbanzo · 2021-01-16T20:26:34Z

@cloud-rocket - if you still see the problem, please re-open this issue with more info. In particular if you run the script profile.py with all 3 models: h5, rt, tflite and are not getting a resolution from that? I'll close this for now as I think TensorRT is the right format for the Nano.

cloud-rocket · 2021-01-20T05:47:50Z

I am not yet reopening, but tflite performance I am getting is 19hz (just for statistics). I forgot to mention I am running TF 2.4

+--------------------+-------+-------+-------+-------+-------+-------+-------+
|        part        |  max  |  min  |  avg  |  50%  |  90%  |  99%  | 99.9% |
+--------------------+-------+-------+-------+-------+-------+-------+-------+
|     CSICamera      |  0.78 |  0.01 |  0.03 |  0.03 |  0.04 |  0.05 |  0.11 |
| MySerialController |  2.16 |  0.02 |  0.03 |  0.02 |  0.03 |  0.05 |  0.10 |
|   ThrottleFilter   |  0.24 |  0.01 |  0.02 |  0.02 |  0.02 |  0.04 |  0.09 |
|   PilotCondition   |  0.94 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.08 |
|   RecordTracker    |  0.91 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.07 |
|        IMU         |  0.82 |  0.01 |  0.02 |  0.02 |  0.03 |  0.05 |  0.11 |
|       WebFpv       |  0.54 |  0.01 |  0.02 |  0.02 |  0.02 |  0.03 |  0.09 |
|    FileWatcher     | 10.45 |  0.03 |  0.32 |  0.08 |  0.28 |  4.05 |  5.84 |
|    FileWatcher     |  8.91 |  0.02 |  0.13 |  0.07 |  0.19 |  1.51 |  4.29 |
|   DelayedTrigger   |  0.13 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.07 |
| TriggeredCallback  |  0.15 |  0.01 |  0.01 |  0.01 |  0.01 |  0.01 |  0.06 |
|    TFLitePilot     | 72.87 | 50.38 | 52.77 | 52.61 | 54.14 | 62.12 | 66.10 |
|     DriveMode      |  0.89 |  0.01 |  0.03 |  0.03 |  0.04 |  0.07 |  0.13 |
|      AiLaunch      |  0.81 |  0.01 |  0.02 |  0.02 |  0.02 |  0.04 |  0.09 |
|   AiRunCondition   |  0.24 |  0.01 |  0.01 |  0.01 |  0.01 |  0.03 |  0.07 |
|    PWMSteering     |  0.92 |  0.02 |  0.05 |  0.03 |  0.10 |  0.14 |  0.21 |
|    PWMThrottle     |  2.71 |  0.01 |  0.02 |  0.02 |  0.03 |  0.05 |  0.10 |
|    PerfMonitor     |  0.58 |  0.02 |  0.03 |  0.03 |  0.04 |  0.07 |  0.12 |
|   MqttTelemetry    |  1.03 |  0.07 |  0.11 |  0.11 |  0.13 |  0.18 |  0.30 |
+--------------------+-------+-------+-------+-------+-------+-------+-------+

sctse999 · 2021-01-20T05:51:17Z

@cloud-rocket This is Nano 2GB?

cloud-rocket · 2021-01-20T06:08:59Z

Nop - 4Gb.

I added np.float32 conversion to tflite.py part, because otherwise it was throwing exception (the image was called with float64 values). Maybe because I am using TF2.4....

    def inference(self, img_arr, other_arr):
        input_data = np.float32(img_arr.reshape(self.input_shape))
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()

I don't think this conversion adds a significant performance penalty

DocGarbanzo · 2021-01-22T20:19:05Z

We saw problems with tf > 2.2.0 in training. But 20 Hz is definitely much too slow. Have you tried the CPU version of TF?

cloud-rocket · 2021-01-22T23:57:50Z

@DocGarbanzo, WDYM by "CPU version"? With standard TF I am getting 3.5hz - this is why I started this ticket....
Any idea what else to consider and why my performance is so bad?

BTW added a PR for tflite fix - #762

Also trying to see how to run TensorRT with TF2 - many changes required...

Tnx

DocGarbanzo · 2021-01-31T17:16:51Z

@cloud-rocket - I just looked at the issue w/ TF > 2.2.0 on RPi yesterday. Indeed, any newer version causes a significant slowness on RPi, at least when using the .h5 version of the model. On a RPi 4 I saw around 180ms using TF 2.3.1 vs 30ms using TF 2.2.0. So we updated RPi install instructions on dev. With the tflite model it drops to ~20ms.

cloud-rocket changed the title ~~Real execution cycle on embedded board is considerably lower than 20Hz and changes between drive and algo modes~~ Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes Nov 29, 2020

DocGarbanzo closed this as completed Jan 16, 2021

fengye mentioned this issue Jan 26, 2021

Fix OLED performance issue #768

Merged

cloud-rocket mentioned this issue Jan 27, 2021

Suggestion: convert Keras/Tflite to be multi-threaded on inference #770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

cloud-rocket commented Nov 29, 2020 •

edited

Loading

sctse999 commented Nov 30, 2020

cloud-rocket commented Dec 1, 2020

tikurahul commented Dec 1, 2020

DocGarbanzo commented Jan 16, 2021

cloud-rocket commented Jan 20, 2021 •

edited

Loading

sctse999 commented Jan 20, 2021

cloud-rocket commented Jan 20, 2021 •

edited

Loading

DocGarbanzo commented Jan 22, 2021

cloud-rocket commented Jan 22, 2021 •

edited

Loading

DocGarbanzo commented Jan 31, 2021

Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

Comments

cloud-rocket commented Nov 29, 2020 • edited Loading

sctse999 commented Nov 30, 2020

cloud-rocket commented Dec 1, 2020

tikurahul commented Dec 1, 2020

DocGarbanzo commented Jan 16, 2021

cloud-rocket commented Jan 20, 2021 • edited Loading

sctse999 commented Jan 20, 2021

cloud-rocket commented Jan 20, 2021 • edited Loading

DocGarbanzo commented Jan 22, 2021

cloud-rocket commented Jan 22, 2021 • edited Loading

DocGarbanzo commented Jan 31, 2021

cloud-rocket commented Nov 29, 2020 •

edited

Loading

cloud-rocket commented Jan 20, 2021 •

edited

Loading

cloud-rocket commented Jan 20, 2021 •

edited

Loading

cloud-rocket commented Jan 22, 2021 •

edited

Loading