THIS IS AN EXPERIMENTAL VERSION!!
Traffic Model
- All Vehicles obey the traffic rules
- Vehicles can accelerate and decelerate depending upon the need
- Each vehicle entering the network has an assinged path to its destination (may not be the shortest)
- The number of vehicles entering the network is sampled from a binomial distribution with its mean as the expected arrival rate (an important parameter, which represents the amount of traffic)
- Three signals in series is considered
Methods Tested
Here's a summary of all the significant methods (see branches), tested for a 15 minute rush hour (vehicles will be continuously entering only for 15 minutes) on three different arrival rates.
We can observe that centralised DQRL performs well for all three arrival rates. Eventhough multi-agent DQRL outperforms centralised RL by a slight margin, it doesn't converge well most of the times.
Testing Methodology
Given the traffic model, and the traffic light control logic (based on pressure at each lane), the episode duration till which all the vehicles leave the intersection is taken as a sole criteria, this may not be practical (since vehicles enter the network continuously), but for the testing purposes, this is a perfectly valid criteria to assess the algorithm's perfomance.
Conclusion
DQRL algorithms significantly outperforms the fixed time control under experimental setting. Although since the model is not so realistic, theres a lot of room for improvement.