layout | title | displayTitle |
---|---|---|
global |
SystemML Algorithms Reference - Factorization Machines |
<a href="algorithms-reference.html">SystemML Algorithms Reference</a> |
The Factorization Machine (FM), is a general predictor like SVMs but is also able to estimate reliable parameters under very high sparsity. The factorization machine models all nested variable interactions (compared to a polynomial kernel in SVM), but uses a factorized parameterization instead of a dense parameterisation like in SVMs.
where the model parameters that have to be estimated are: $$ w_0 \in R, W \in R^n, V \in R^{n \times k} $$
and
$$
\left <\cdot, \cdot \right >
$$
is the dot product of two vectors of size
A row $$ v_i $$ with in $$ V $$describes the $$i$$th variable with
- $$ w_0 $$ : global bias
- $$ w_j $$ : models the strength of the ith variable
- $$ w_{i,j} = \left <v_i, v_j \right> $$ : models the interaction between the $$i$$th & $$j$$th variable.
Instead of using an own model parameter $$ w_{i,j} \in R $$ for each interaction, the FM models the interaction by factorizing it.
It is well known that for any positive definite matrix
In sparse settings, there is usually not enough data to estimate interaction between variables directly & independently. FMs can estimate interactions even in these settings well because they break the independence of the interaction parameters by factorizing them.
Due to factorization of pairwise interactions, there is not model parameter that directly depends
on two variables ( e.g., a parameter with an index
The gradient vector taken for each of the weights, is $$ % $$
The train()
function in the fm-regression.dml script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training.
train = function(matrix[double] X, matrix[double] y, matrix[double] X_val, matrix[double] y_val)
return (matrix[double] w0, matrix[double] W, matrix[double] V) {
/*
* Trains the FM model.
*
* Inputs:
* - X : n examples with d features, of shape (n, d)
* - y : Target matrix, of shape (n, 1)
* - X_val : Input validation data matrix, of shape (n, d)
* - y_val : Target validation matrix, of shape (n, 1)
*
* Outputs:
* - w0, W, V : updated model parameters.
*
* Network Architecture:
*
* X --> [model] --> out --> l2_loss::backward(out, y) --> dout
*
*/
...
# 7.Call adam::update for all parameters
[w0,mw0,vw0] = adam::update(w0, dw0, lr, beta1, beta2, epsilon, t, mw0, vw0);
[W, mW, vW] = adam::update(W, dW, lr, beta1, beta2, epsilon, t, mW, vW );
[V, mV, vV] = adam::update(V, dV, lr, beta1, beta2, epsilon, t, mV, vV );
}
Once the train
function returns the weights for the fm
model, these are passed to the predict
function.
predict = function(matrix[double] X, matrix[double] w0, matrix[double] W, matrix[double] V)
return (matrix[double] out) {
/*
* Computes the predictions for the given inputs.
*
* Inputs:
* - X : n examples with d features, of shape (n, d).
* - w0, W, V : trained model parameters.
*
* Outputs:
* - out : target vector, y.
*/
out = fm::forward(X, w0, W, V);
}
The fm-regression-dummy-data.dml file can be a nice template, to extend.
The sign of
The train
function in the fm-binclass.dml script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training. This script also contain train()
and predict()
function as in the case of regression.
The fm-regression-dummy-data.dml file can be a nice template, to extend.
Regularization terms like