Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes #690

Closed
cloud-rocket opened this issue Nov 29, 2020 · 10 comments

Comments

@cloud-rocket
Copy link
Contributor

cloud-rocket commented Nov 29, 2020

  • My own setup runs at 8hz on Jetson Nano in drive mode and drops to 3hz in local_angle mode.
    • The difference is even bigger when disabling OLED part (20hz in drive mode vs 3.5zh in local_angle mode)
  • Non of the CPU cores (4 on Nano) was fully loaded during the test
  • Algorithm performance is probably suffers, because training and runtime is done on different frequency
  • Performance was monitored via the following code - Add performance monitor #689

This is my setup:

+--------------------+--------+--------+--------+--------+--------+--------+--------+
|        part        |  max   |  min   |  avg   |  50%   |  90%   |  99%   | 99.9%  |
+--------------------+--------+--------+--------+--------+--------+--------+--------+
|     CSICamera      |  0.09  |  0.01  |  0.02  |  0.01  |  0.02  |  0.05  |  0.09  |
| MySerialController |  0.09  |  0.02  |  0.02  |  0.02  |  0.03  |  0.06  |  0.08  |
|   ThrottleFilter   |  0.10  |  0.01  |  0.02  |  0.01  |  0.02  |  0.05  |  0.10  |
|   PilotCondition   |  0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.04  |  0.07  |
|   RecordTracker    |  0.07  |  0.01  |  0.01  |  0.01  |  0.02  |  0.03  |  0.06  |
|        IMU         |  0.07  |  0.02  |  0.02  |  0.02  |  0.03  |  0.06  |  0.07  |
|       WebFpv       |  0.04  |  0.01  |  0.02  |  0.02  |  0.03  |  0.03  |  0.04  |
|    PerfMonitor     | 13.45  |  0.92  |  2.35  |  1.33  |  5.27  | 11.13  | 13.20  |
|     DriveMode      |  0.08  |  0.02  |  0.03  |  0.03  |  0.04  |  0.07  |  0.08  |
|      AiLaunch      |  0.04  |  0.02  |  0.02  |  0.02  |  0.02  |  0.04  |  0.04  |
|   AiRunCondition   |  0.05  |  0.01  |  0.02  |  0.01  |  0.02  |  0.02  |  0.05  |
|    PWMSteering     |  0.08  |  0.02  |  0.03  |  0.03  |  0.04  |  0.05  |  0.07  |
|    PWMThrottle     |  0.08  |  0.02  |  0.02  |  0.02  |  0.02  |  0.07  |  0.08  |
|      OLEDPart      | 155.06 | 125.55 | 136.43 | 137.56 | 142.47 | 147.86 | 154.13 |
|     Telemetry      | 15.27  |  0.11  |  0.91  |  0.12  |  4.99  | 12.03  | 14.92  |
+--------------------+--------+--------+--------+--------+--------+--------+--------+
@cloud-rocket cloud-rocket changed the title Real execution cycle on embedded board is considerably lower than 20Hz and changes between drive and algo modes Real execution frequency on embedded board is considerably lower than 20Hz and changes between drive and algo modes Nov 29, 2020
@sctse999
Copy link
Contributor

It looks like OLEDPart is taking a lot of time to finish. Can you turn off those time consuming parts and verify the result?

@cloud-rocket
Copy link
Contributor Author

@sctse999 - I disable oled and reduced the overhead of Telemetry and PerfMonitor. I am still getting around 4hz with active model:

+--------------------+---------+--------+--------+--------+--------+--------+---------+
|        part        |   max   |  min   |  avg   |  50%   |  90%   |  99%   |  99.9%  |
+--------------------+---------+--------+--------+--------+--------+--------+---------+
|     CSICamera      |   0.24  |  0.01  |  0.02  |  0.02  |  0.04  |  0.05  |   0.15  |
| MySerialController |   0.10  |  0.02  |  0.03  |  0.02  |  0.03  |  0.04  |   0.09  |
|   ThrottleFilter   |   0.10  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
|   PilotCondition   |   0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.02  |   0.07  |
|   RecordTracker    |   0.08  |  0.01  |  0.01  |  0.01  |  0.02  |  0.02  |   0.07  |
|        IMU         |   2.59  |  0.02  |  0.03  |  0.03  |  0.03  |  0.05  |   0.18  |
|       WebFpv       |   0.84  |  0.01  |  0.02  |  0.02  |  0.03  |  0.03  |   0.05  |
|    PerfMonitor     |   0.24  |  0.02  |  0.03  |  0.03  |  0.04  |  0.06  |   0.11  |
|    FileWatcher     |  18.39  |  0.05  |  0.25  |  0.16  |  0.29  |  3.76  |   7.49  |
|    FileWatcher     |   9.76  |  0.03  |  0.17  |  0.08  |  0.20  |  3.24  |   6.40  |
|   DelayedTrigger   |   0.09  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
| TriggeredCallback  |   0.09  |  0.01  |  0.01  |  0.01  |  0.01  |  0.02  |   0.09  |
|    KerasLinear     | 1515.96 | 289.63 | 329.83 | 321.82 | 344.91 | 386.60 | 1115.74 |
|     DriveMode      |   0.24  |  0.02  |  0.04  |  0.04  |  0.10  |  0.13  |   0.24  |
|      AiLaunch      |   0.08  |  0.01  |  0.02  |  0.02  |  0.02  |  0.04  |   0.08  |
|   AiRunCondition   |   0.09  |  0.01  |  0.01  |  0.01  |  0.02  |  0.03  |   0.07  |
|    PWMSteering     |   0.60  |  0.02  |  0.07  |  0.07  |  0.11  |  0.16  |   0.22  |
|    PWMThrottle     |   0.17  |  0.02  |  0.03  |  0.03  |  0.04  |  0.05  |   0.12  |
|   MqttTelemetry    |   0.94  |  0.09  |  0.13  |  0.13  |  0.14  |  0.19  |   0.38  |
+--------------------+---------+--------+--------+--------+--------+--------+---------+

@tikurahul
Copy link
Collaborator

You should try using TensorRT on the Nano or TFLite. Your framerate will improve considerably. I usually get 45Hz with my model.
A p90 of 344.91ms is actually vey poor.

@DocGarbanzo
Copy link
Contributor

@cloud-rocket - if you still see the problem, please re-open this issue with more info. In particular if you run the script profile.py with all 3 models: h5, rt, tflite and are not getting a resolution from that? I'll close this for now as I think TensorRT is the right format for the Nano.

@cloud-rocket
Copy link
Contributor Author

cloud-rocket commented Jan 20, 2021

I am not yet reopening, but tflite performance I am getting is 19hz (just for statistics). I forgot to mention I am running TF 2.4

+--------------------+-------+-------+-------+-------+-------+-------+-------+
|        part        |  max  |  min  |  avg  |  50%  |  90%  |  99%  | 99.9% |
+--------------------+-------+-------+-------+-------+-------+-------+-------+
|     CSICamera      |  0.78 |  0.01 |  0.03 |  0.03 |  0.04 |  0.05 |  0.11 |
| MySerialController |  2.16 |  0.02 |  0.03 |  0.02 |  0.03 |  0.05 |  0.10 |
|   ThrottleFilter   |  0.24 |  0.01 |  0.02 |  0.02 |  0.02 |  0.04 |  0.09 |
|   PilotCondition   |  0.94 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.08 |
|   RecordTracker    |  0.91 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.07 |
|        IMU         |  0.82 |  0.01 |  0.02 |  0.02 |  0.03 |  0.05 |  0.11 |
|       WebFpv       |  0.54 |  0.01 |  0.02 |  0.02 |  0.02 |  0.03 |  0.09 |
|    FileWatcher     | 10.45 |  0.03 |  0.32 |  0.08 |  0.28 |  4.05 |  5.84 |
|    FileWatcher     |  8.91 |  0.02 |  0.13 |  0.07 |  0.19 |  1.51 |  4.29 |
|   DelayedTrigger   |  0.13 |  0.01 |  0.01 |  0.01 |  0.02 |  0.03 |  0.07 |
| TriggeredCallback  |  0.15 |  0.01 |  0.01 |  0.01 |  0.01 |  0.01 |  0.06 |
|    TFLitePilot     | 72.87 | 50.38 | 52.77 | 52.61 | 54.14 | 62.12 | 66.10 |
|     DriveMode      |  0.89 |  0.01 |  0.03 |  0.03 |  0.04 |  0.07 |  0.13 |
|      AiLaunch      |  0.81 |  0.01 |  0.02 |  0.02 |  0.02 |  0.04 |  0.09 |
|   AiRunCondition   |  0.24 |  0.01 |  0.01 |  0.01 |  0.01 |  0.03 |  0.07 |
|    PWMSteering     |  0.92 |  0.02 |  0.05 |  0.03 |  0.10 |  0.14 |  0.21 |
|    PWMThrottle     |  2.71 |  0.01 |  0.02 |  0.02 |  0.03 |  0.05 |  0.10 |
|    PerfMonitor     |  0.58 |  0.02 |  0.03 |  0.03 |  0.04 |  0.07 |  0.12 |
|   MqttTelemetry    |  1.03 |  0.07 |  0.11 |  0.11 |  0.13 |  0.18 |  0.30 |
+--------------------+-------+-------+-------+-------+-------+-------+-------+

@sctse999
Copy link
Contributor

@cloud-rocket This is Nano 2GB?

@cloud-rocket
Copy link
Contributor Author

cloud-rocket commented Jan 20, 2021

Nop - 4Gb.

I added np.float32 conversion to tflite.py part, because otherwise it was throwing exception (the image was called with float64 values). Maybe because I am using TF2.4....

    def inference(self, img_arr, other_arr):
        input_data = np.float32(img_arr.reshape(self.input_shape))
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()

I don't think this conversion adds a significant performance penalty

@DocGarbanzo
Copy link
Contributor

We saw problems with tf > 2.2.0 in training. But 20 Hz is definitely much too slow. Have you tried the CPU version of TF?

@cloud-rocket
Copy link
Contributor Author

cloud-rocket commented Jan 22, 2021

@DocGarbanzo, WDYM by "CPU version"? With standard TF I am getting 3.5hz - this is why I started this ticket....
Any idea what else to consider and why my performance is so bad?

BTW added a PR for tflite fix - #762

Also trying to see how to run TensorRT with TF2 - many changes required...

Tnx

@DocGarbanzo
Copy link
Contributor

@cloud-rocket - I just looked at the issue w/ TF > 2.2.0 on RPi yesterday. Indeed, any newer version causes a significant slowness on RPi, at least when using the .h5 version of the model. On a RPi 4 I saw around 180ms using TF 2.3.1 vs 30ms using TF 2.2.0. So we updated RPi install instructions on dev. With the tflite model it drops to ~20ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants