You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The best models trained by BC、TD3、SAC、E-SAC、HIRL (our method) are stored in the ./results folder. The validation results of the models are as follows (validation results without random initialization and with random initialization are obtained by running 50 episodes with 5 different random seeds, '±' indicates standard deviation; the number of hits and launches are obtained by running 10 episodes; the best results are highlighted in bold).
Validation Results without Random Initialization
Methods
Shoot-down Success Rate
Hit Success Rate
Rewards
HIRL (adaptive)
100.0% ± 0.0%
100.0% ± 0.0%
-680.8 ± 6.7
HIRL (linear)
100.0% ± 0.0%
100.0% ± 0.0%
-953.9 ± 13.8
TD3
0.0% ± 0.0%
0.0% ± 0.0%
-4707.2 ± 0.0
E-SAC
100.0% ± 0.0%
100.0% ± 0.0%
-1431.2 ± 0.2
SAC
100.0% ± 0.0%
0.0% ± 0.0%
-2985.7 ± 0.0
BC
62.8% ± 1.0%
62.8% ± 1.0%
-12228.3 ± 880.2
Validation Results with Random Initialization
Methods
Shoot-down Success Rate
Hit Success Rate
Rewards
HIRL (adaptive)
98.0% ± 1.3%
98.0% ± 1.3%
-1436.0 ± 238.9
HIRL (linear)
86.0% ± 5.4%
86.0% ± 5.4%
-5800.8 ± 1420.3
TD3
0.0% ± 0.0%
0.0% ± 0.0%
-5720.9 ± 715.8
E-SAC
90.0% ± 2.8%
90.0% ± 2.8%
-3722.2 ± 395.5
SAC
44.0% ± 3.3%
0.0% ± 0.0%
-8318.1 ± 822.8
BC
22.4% ± 3.2%
22.4% ± 3.2%
-20504.7 ± 1156.3
Launch Efficiency Results
Methods
Hits / Launches
HIRL (adaptive)
100.0%
HIRL (linear)
100.0%
E-SAC
11.4%
BC
92.3%
Getting Started
Installation Requirements
It is recommended to use a computer with Windows operating system (we have tried using Linux, but it seems that Harfang3D is not compatible).
Install Harfang3D sandbox from the release or source. It is recommended to install from source for more flexibility, such as customizing the network port of the environment.
Install the dependencies required for this code.
conda env create -f environment.yaml
Training
In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
cdsource
python main.py network_port 12345
In the HIRL4UCAV folder, use the following command to start training (note to modify the IP number in the train_all.py; use --render to enable training rendering, and use --plot to draw visualization results).
In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
cdsource
python main.py network_port 12345
To test the BC, TD3, and HIRL models, use the following command in the HIRL4UCAV folder (note to modify the IP number and the model name in the train_all.py (only the name before 'xxx_Harfang_GYM' is needed); use --render to enable test rendering).
# Sucess Rate Validation# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 1' is the random initialization mode, 'test mode 2' is the infinite missiles mode, and 'test mode 3' is the original environment# Here's an example
python validate_all.py --agent HIRL --port 12345 --type soft --model_name s-HIRL --test --test_mode 1 --seed 1
# Reward Validation# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 4' is the random initialization mode, and 'test mode 5' is the original environment# Here's an example
python validate_all.py --agent HIRL --port 12345 --type soft --model_name s-HIRL --test --test_mode 4 --seed 1
To test the SAC and E-SAC models, use the following command in the HIRL4UCAV folder (types of test mode are as described above).
@misc{li2024imitative,
title={An Imitative Reinforcement Learning Framework for Autonomous Dogfight},
author={Siyuan Li and Rongchang Zuo and Peng Liu and Yingnan Zhao},
year={2024},
eprint={2406.11562},
archivePrefix={arXiv}
}