Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

The best models trained by BC、TD3、SAC、E-SAC、HIRL (our method) are stored in the ./results folder. The validation results of the models are as follows (validation results without random initialization and with random initialization are obtained by running 50 episodes with 5 different random seeds, '±' indicates standard deviation; the number of hits and launches are obtained by running 10 episodes; the best results are highlighted in bold).

Validation Results without Random Initialization

Methods	Shoot-down Success Rate	Hit Success Rate	Rewards
HIRL (adaptive)	100.0% ± 0.0%	100.0% ± 0.0%	-680.8 ± 6.7
HIRL (linear)	100.0% ± 0.0%	100.0% ± 0.0%	-953.9 ± 13.8
TD3	0.0% ± 0.0%	0.0% ± 0.0%	-4707.2 ± 0.0
E-SAC	100.0% ± 0.0%	100.0% ± 0.0%	-1431.2 ± 0.2
SAC	100.0% ± 0.0%	0.0% ± 0.0%	-2985.7 ± 0.0
BC	62.8% ± 1.0%	62.8% ± 1.0%	-12228.3 ± 880.2

Validation Results with Random Initialization

Methods	Shoot-down Success Rate	Hit Success Rate	Rewards
HIRL (adaptive)	98.0% ± 1.3%	98.0% ± 1.3%	-1436.0 ± 238.9
HIRL (linear)	86.0% ± 5.4%	86.0% ± 5.4%	-5800.8 ± 1420.3
TD3	0.0% ± 0.0%	0.0% ± 0.0%	-5720.9 ± 715.8
E-SAC	90.0% ± 2.8%	90.0% ± 2.8%	-3722.2 ± 395.5
SAC	44.0% ± 3.3%	0.0% ± 0.0%	-8318.1 ± 822.8
BC	22.4% ± 3.2%	22.4% ± 3.2%	-20504.7 ± 1156.3

Launch Efficiency Results

Methods	Hits / Launches
HIRL (adaptive)	100.0%
HIRL (linear)	100.0%
E-SAC	11.4%
BC	92.3%

Getting Started

Installation Requirements

It is recommended to use a computer with Windows operating system (we have tried using Linux, but it seems that Harfang3D is not compatible).
Install Harfang3D sandbox from the release or source. It is recommended to install from source for more flexibility, such as customizing the network port of the environment.
Install the dependencies required for this code.
```
conda env create -f environment.yaml
```

Training

In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
```
cd source
python main.py network_port 12345
```

In the HIRL4UCAV folder, use the following command to start training (note to modify the IP number in the train_all.py; use --render to enable training rendering, and use --plot to draw visualization results).

# HIRL (adaptive)
python train_all.py --agent HIRL --port 12345 --type soft --model_name s-HIRL

# HIRL (linear)
python train_all.py --agent HIRL --port 12345 --type linear --bc_weight 1 --model_name l-HIRL

# HIRL (fixed)
python train_all.py --agent HIRL --port 12345 --type fixed --bc_weight 0.5 --model_name f-HIRL

# TD3
python train_all.py --agent TD3 --port 12345 --model_name td3

# BC
python train_all.py --agent BC --port 12345 --model_name bc

# SAC
python train_sac.py --type sac --port 12345 --model_name sac

# E-SAC
python train_sac.py --type esac --port 12345 --model_name esac

Validation

In the Harfang3D sandbox folder, use the following command to open Harfang3D sandbox. You can specify the port number with network_port. After opening, you need to manually enter the network mode.
```
cd source
python main.py network_port 12345
```

To test the BC, TD3, and HIRL models, use the following command in the HIRL4UCAV folder (note to modify the IP number and the model name in the train_all.py (only the name before 'xxx_Harfang_GYM' is needed); use --render to enable test rendering).

# Sucess Rate Validation
# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 1' is the random initialization mode, 'test mode 2' is the infinite missiles mode, and 'test mode 3' is the original environment
# Here's an example
python validate_all.py --agent HIRL --port 12345 --type soft --model_name s-HIRL --test --test_mode 1 --seed 1

# Reward Validation
# Add '--test --test_mode n' to the end of the corresponding training command. 'test mode 4' is the random initialization mode, and 'test mode 5' is the original environment
# Here's an example
python validate_all.py --agent HIRL --port 12345 --type soft --model_name s-HIRL --test --test_mode 4 --seed 1

To test the SAC and E-SAC models, use the following command in the HIRL4UCAV folder (types of test mode are as described above).
```
python validate_sac.py --test_mode 1 --port 12345 --seed 1 
```

Citation

@misc{li2024imitative,
    title={An Imitative Reinforcement Learning Framework for Autonomous Dogfight}, 
    author={Siyuan Li and Rongchang Zuo and Peng Liu and Yingnan Zhao},
    year={2024},
    eprint={2406.11562},
    archivePrefix={arXiv}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

Validation Results without Random Initialization

Validation Results with Random Initialization

Launch Efficiency Results

Getting Started

Installation Requirements

Training

Validation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Highly Imitative Reinforcement Learning for UCAV

Result Charts

Models and Results

Validation Results without Random Initialization

Validation Results with Random Initialization

Launch Efficiency Results

Getting Started

Installation Requirements

Training

Validation

Citation