add LinUCB algorithm

tojo-soraai · May 10, 2020 · 9a03902 · 9a03902
1 parent 09f189e
commit 9a03902
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 7 deletions.
diff --git a/numpy_ml/README.md b/numpy_ml/README.md
@@ -106,6 +106,7 @@ This repo includes code for the following models:
     - Epsilon-greedy
     - Thompson sampling w/ conjugate priors
         - Beta-Bernoulli sampler
+    - LinUCB
 
 8. **Reinforcement learning models**
     - Cross-entropy method agent
@@ -120,7 +121,11 @@ This repo includes code for the following models:
     - k-Nearest neighbors classification and regression
     - Gaussian process regression
 
-10. **Preprocessing**
+10. **Matrix factorization**
+    - Regularized alternating least-squares
+    - Non-negative matrix factorization
+
+11. **Preprocessing**
     - Discrete Fourier transform (1D signals)
     - Discrete cosine transform (type-II) (1D signals)
     - Bilinear interpolation (2D signals)
@@ -135,7 +140,7 @@ This repo includes code for the following models:
     - Term frequency-inverse document frequency (TF-IDF) encoding
     - MFCC encoding
 
-11. **Utilities**
+12. **Utilities**
     - Similarity kernels
     - Distance metrics
     - Priority queue

diff --git a/numpy_ml/bandits/README.md b/numpy_ml/bandits/README.md
@@ -7,11 +7,13 @@ policies.
 
 1. **Bandits**
     - MAB: Bernoulli, Multinomial, and Gaussian payout distributions
+    - Contextual MAB: Linear contextual bandits
 
 2. **Policies**
     - Epsilon-greedy
     - UCB1 ([Auer, Cesa-Bianchi, & Fisher, 2002](https://link.springer.com/content/pdf/10.1023/A:1013689704352.pdf))
     - Conjugate Thompson sampler for Bernoulli bandits ([Thompson, 1933](https://www.gwern.net/docs/statistics/decision/1933-thompson.pdf); [Chapelle & Li, 2010](https://papers.nips.cc/paper/4321-an-empirical-evaluation-of-thompson-sampling.pdf))
+    - LinUCB ([Li, Chu, Langford, & Schapire, 2010](http://rob.schapire.net/papers/www10.pdf))
 
 ## Plots
 <p align="center">

diff --git a/numpy_ml/bandits/bandits.py b/numpy_ml/bandits/bandits.py
@@ -419,9 +419,10 @@ def __init__(self, K, D, payoff_variance=1):
 
             \mathbb{E}[r_{t, a} \mid \mathbf{x}_{t, a}] = \mathbf{x}_{t,a}^\top \theta_a
 
-        In this implementation, the arm coefficient vectors are sampled
-        independently from a Uniform distribution on the interval between -1
-        and 1, and the specific reward at timestep `t` is normally distributed:
+        In this implementation, the arm coefficient vectors :math:`\theta` are
+        initialized independently from a uniform distribution on the interval
+        [-1, 1], and the specific reward at timestep `t` is normally
+        distributed:
 
         .. math::
 
@@ -473,8 +474,8 @@ def parameters(self):
 
     def get_context(self):
         """
-        Sample a context vector from a standard normal distribution for each of
-        the arms.
+        Sample the context vectors for each arm from a multivariate standard
+        normal distribution.
 
         Returns
         -------