This project contains the code used for the simulations in the paper: "Trend Detection based Regret Minimization for Bandit Problems" - Nakhe and Reiffenhäuser.
The code essentially implements four algorithms, namely
- Standard Exp3
- Exp3.S
- Exp3.R
- Exp3D (algorithm proposed in the paper).
The performance of these algorithms is compared for two different reward models, namely a. dynamic stochastic regime b. adverserial regime with gap
These models represent a generalization of the conventional models.