Skip to content

Commit

Permalink
add LinUCB algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
ddbourgin committed May 10, 2020
1 parent 09f189e commit 9a03902
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 7 deletions.
9 changes: 7 additions & 2 deletions numpy_ml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ This repo includes code for the following models:
- Epsilon-greedy
- Thompson sampling w/ conjugate priors
- Beta-Bernoulli sampler
- LinUCB

8. **Reinforcement learning models**
- Cross-entropy method agent
Expand All @@ -120,7 +121,11 @@ This repo includes code for the following models:
- k-Nearest neighbors classification and regression
- Gaussian process regression

10. **Preprocessing**
10. **Matrix factorization**
- Regularized alternating least-squares
- Non-negative matrix factorization

11. **Preprocessing**
- Discrete Fourier transform (1D signals)
- Discrete cosine transform (type-II) (1D signals)
- Bilinear interpolation (2D signals)
Expand All @@ -135,7 +140,7 @@ This repo includes code for the following models:
- Term frequency-inverse document frequency (TF-IDF) encoding
- MFCC encoding

11. **Utilities**
12. **Utilities**
- Similarity kernels
- Distance metrics
- Priority queue
Expand Down
2 changes: 2 additions & 0 deletions numpy_ml/bandits/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ policies.

1. **Bandits**
- MAB: Bernoulli, Multinomial, and Gaussian payout distributions
- Contextual MAB: Linear contextual bandits

2. **Policies**
- Epsilon-greedy
- UCB1 ([Auer, Cesa-Bianchi, & Fisher, 2002](https://link.springer.com/content/pdf/10.1023/A:1013689704352.pdf))
- Conjugate Thompson sampler for Bernoulli bandits ([Thompson, 1933](https://www.gwern.net/docs/statistics/decision/1933-thompson.pdf); [Chapelle & Li, 2010](https://papers.nips.cc/paper/4321-an-empirical-evaluation-of-thompson-sampling.pdf))
- LinUCB ([Li, Chu, Langford, & Schapire, 2010](http://rob.schapire.net/papers/www10.pdf))

## Plots
<p align="center">
Expand Down
11 changes: 6 additions & 5 deletions numpy_ml/bandits/bandits.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,9 +419,10 @@ def __init__(self, K, D, payoff_variance=1):
\mathbb{E}[r_{t, a} \mid \mathbf{x}_{t, a}] = \mathbf{x}_{t,a}^\top \theta_a
In this implementation, the arm coefficient vectors are sampled
independently from a Uniform distribution on the interval between -1
and 1, and the specific reward at timestep `t` is normally distributed:
In this implementation, the arm coefficient vectors :math:`\theta` are
initialized independently from a uniform distribution on the interval
[-1, 1], and the specific reward at timestep `t` is normally
distributed:
.. math::
Expand Down Expand Up @@ -473,8 +474,8 @@ def parameters(self):

def get_context(self):
"""
Sample a context vector from a standard normal distribution for each of
the arms.
Sample the context vectors for each arm from a multivariate standard
normal distribution.
Returns
-------
Expand Down

0 comments on commit 9a03902

Please sign in to comment.