Code that compiles a planner's policy into a model-free value function which eventually outperforms the planner through constrained exploration.
From the paper "Reducing the Planning Horizon through Reinforcement Learning" appearing in ECML PKDD 2022.
Please cite if you find it useful :)
@inproceedings{dunbar2022reducing,
title={Reducing the Planning Horizon through Reinforcement Learning},
author={Dunbar, Logan and Rosman, Benjamin and Cohn, Anthony and Leonetti, Matteo},
booktitle={Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
year={2022}
}