- input_shape: input data's shape, for example, (C, H, W) or (features, ).
- data: this layer's input and output tensor's value.
from xshinnosuke.models import Model
from xshinnosuke.layers import Input
X = Input(input_shape=(10, 5, 5))
model = Model(inputs=X, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- out_features: out feature numbers.
- activation: activation function. see details in Activations
- use_bias: whether use bias.
- kernel_initializer: kernel initialize method. see details in Initializers
- bias_initializer: bias initialize method. see details in Initializers
- kernel_regularizer: not implemented.
from xshinnosuke.models import Sequential
from xshinnosuke.layers import Dense
model = Sequential()
model.add(Dense(out_features=100, input_shape=(500, ), activation='relu'))
model.add(Dense(out_features=10))
model.compile(loss='mse', optimizer='adam')
print(model)
- start: flatten start axis, for example, a tensor with shape (N, C, H, W), if start = 1, after flatten tensor will be (N, C * H * W); if start = 2, after flatten tensor will be (N, C, H * W)
from xshinnosuke.models import Model
from xshinnosuke.layers import Flatten, Input
X_input = Input(input_shape=(10, 5, 8))
X = Flatten(start=1)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- pad_size: for example, (1, 1), which means pad input(N, C, H, W) to (N, C, H+2, W+2).
from xshinnosuke.models import Model
from xshinnosuke.layers import ZeroPadding2D, Input
X_input = Input(input_shape=(10, 5, 5))
X = ZeroPadding2D(pad_size=(2, 2))(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
-
out_channels: filter's numbers.
-
kernel_size: filter's size. for example, (3, 3) or 3.
-
use_bias: whether use bias.
-
stride: convolution stride.
-
padding: 'SAME' or 'VALID', 'VALID' means no padding, 'SAME' means pad input to get the same output size as input.
-
activation: activation function. see details in Activations
-
kernel_initializer: kernel initialize method. see details in Initializers
-
bias_initializer: bias initialize method. see details in Initializers
from xshinnosuke.models import Model from xshinnosuke.layers import Conv2D, Input X_input = Input(input_shape=(3, 24, 24)) X = Conv2D(out_channels=16, kernel_size=(3, 3), stride=1, padding='VALID', activation='relu')(X_input) model = Model(inputs=X_input, outputs=X) model.compile(optimizer='sgd', loss='bce') print(model)
- pool_size: pooling kernel size, for example (2, 2) means apply max pooling in every 2 x 2 area.
- stride: pooling stride.
- pool_size: pooling kernel size, for example (2, 2) means apply mean pooling in every 2 x 2 area.
- stride: pooling stride.
from xshinnosuke.models import Model
from xshinnosuke.layers import MaxPooling2D, AvgPooling2D, Input
X_input = Input(input_shape=(3, 24, 24))
X = MaxPooling2D(kernel_size=2)(X_input)
X = AvgPooling2D(kernel_size=2)(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='bce')
print(model)
- act_name: activation function name, support ReLU, Sigmoid, etc. see details in Activations
from xshinnosuke.models import Model
from xshinnosuke.layers import Activation, Input
X_input = Input(input_shape=(3, 24, 24))
X = Activation('relu')(X_input)
X = Activation('sigmoid')(X)
X = Activation('softmax')(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='cross_entropy')
print(model)
- shape: shape after reshape operations.
- inplace: apply reshape on the original data directly.
from xshinnosuke.models import Model
from xshinnosuke.layers import Reshape, Input
X_input = Input(input_shape=(3, 24, 24))
X = Reshape((3, 12, 12, 4))
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='cross_entropy')
print(model)
- keep_prob: probability of keeping a unit active.
from xshinnosuke.models import Model
from xshinnosuke.layers import Dropout, Input
X_input = Input(input_shape=(500, ))
X = Dropout(keep_prob=0.5)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- epsilon:
$\epsilon$ value. - momentum: at training time, we use moving averages to update
$u_B \rightarrow$ $moving_u = momentum * moving_u + (1 - momentum) * u_B$ and$\sigma_B \rightarrow moving_\sigma = momentum * moving_\sigma + (1 - momentum) * \sigma_B$ - axis: use normalization on which axis, for Dense Layer, it should be 1 or -1, for Convolution Layer, it should be 1.
- gamma_initializer: initialize
$\gamma$ method. see details in Initializers - beta_initializer: initialize
$\beta$ method. see details in Initializers - moving_mean_initializer: initialize
$moving_u$ method. see details in Initializers - moving_variance_initializer: initialize
$moving_\sigma$ method. see details in Initializers
- epsilon:
$\epsilon$ value. - gamma_initializer: initialize
$\gamma$ method. see details in Initializers - beta_initializer: initialize
$\beta$ method. see details in Initializers
- epsilon:
$\epsilon$ value. - G: group numbers.
- gamma_initializer: initialize
$\gamma$ method. see details in Initializers - beta_initializer: initialize
$\beta$ method. see details in Initializers
from xshinnosuke.models import Model
from xshinnosuke.layers import BatchNormalization, LayerNormalization, GroupNormalization, Input
X_input = Input(input_shape=(16, 5, 5))
X = BatchNormalization()(X_input)
X = LayerNormalization()(X)
X = GroupNormalization()(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- input_dim: the max size of vocabulary.
- out_dim: after embedding dimension, for example, out_dim = E, input data (N, T) after embedding's shape is (N, T, E).
- embeddings_initializer: embedding kernel initialize method. see details in Initializers
- mask_zero: use masks.
from xshinnosuke.models import Sequential
from xshinnosuke.layers import Embedding
model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=200, input_shape=(30, )))
model.compile(optimizer='sgd', loss='mse')
print(model)
- units: rnn hidden unit numbers, for example, units = a, input data (N, T, L) after rnn will output (N, T, a).
- activation: activation method. see details in Activations
- initializer:
$W_{xa}$ initialize method. see details in Initializers - recurrent_initializer:
$W_{aa}$ initialize method. see details in Initializers - return_sequences: if True, return all timesteps a
$\rightarrow$ $[a^1, a^2,..., a^t]$ ; if False, return the last timesteps$a^t$ . - return_state: if True, return return_sequences' result and all timesteps a.
- stateful: if True, use last time
$a^t$ to initialize this time$a^1$ ; if False, use 0 to initialize this time$a^1$ .
from xshinnosuke.models import Model
from xshinnosuke.layers import SimpleRNN, Input
X_input = Input(input_shape=(30, 200))
X = SimpleRNN(units=50)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
at every timesteps
- units: lstm hidden unit numbers.
- activation: activation method. see details in Activations
- recurrent_activation: activation method. see details in Activations
- initializer:
$W_{xa}$ initialize method. see details in Initializers - recurrent_initializer:
$W_{aa}$ initialize method. see details in Initializers - unit_forget_bias: if True, initialize
$f^t$ bias$b_f$ as 1, else 0. - return_sequences: if True, return all timesteps a
$\rightarrow$ $[a^1, a^2,..., a^t]$ ; if False, return the last timesteps$a^t$ . - return_state: if True, return return_sequences' result and all timesteps a.
- stateful: if True, use last time
$a^t$ to initialize this time$a^1$ ; if False, use 0 to initialize this time$a^1$ .
from xshinnosuke.models import Model
from xshinnosuke.layers import LSTM, Input
X_input = Input(input_shape=(30, 200))
X = LSTM(units=50, return_sequences=True)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- layer: to apply time distributed layer.
from xshinnosuke.layers import Input, Dense, LSTM, TimeDistributed
from xshinnosuke.models import Model
X_input = Input(input_shape=(25, 97))
X = LSTM(units=100, return_sequences=True, stateful=True)(X_input)
X = TimeDistributed(Dense(50))(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- data: initialize value of this variable.
- in_bounds: in_bound layer(s).
- out_bounds: out_bound layer(s).
- name: name of this variable.
- requires_grad: whether requires gradient.
- dtype: data type.
-
data: initialize value of this variable.
-
in_bounds: in_bound layer(s).
-
out_bounds: out_bound layer(s).
-
name: name of this variable.
-
requires_grad: whether requires gradient.
-
dtype: data type.
from xshinnosuke.nn import Constant a = Constant(5) print('before: ', a) a.data = 4 print('after: ', a) # result in console ''' before: Constant(5.0, requires_grad=False) after: Constant(5.0, requires_grad=False) UserWarning: Can not change the value of a Constant! '''
-
SGD(lr=0.01, decay=0.0, *args, **kwargs) $$ w = w - lr * dw \ b = b - lr * db $$
- lr: learning rate.
- decay: learning rate decay.
-
Momentum(lr=0.01, decay=0.0, rho=0.9, *args, **kwargs) $$ V_{dw} = rho * V_{dw} + (1 - rho) * dw \ V_{db} = rho * V_{db} + (1 - rho) * db \ w = w - lr * V_{dw} \ b = b - lr * V_{db} $$
- lr: learning rate.
- decay: learning rate decay.
- rho: moving averages parameter.
-
RMSprop(lr=0.001, decay=0.0, rho=0.9, epsilon=1e-7, *args, **kwargs) $$ S_{dw} = rho * S_{dw} + (1 - rho) * d_w^2 \ S_{db} = rho * S_{db} + (1 - rho) * d_b^2 \ w = w - lr * \frac{dw}{\sqrt{S_{dw} + \epsilon}} \ b = b - lr * \frac{db}{\sqrt{S_{db} + \epsilon}} $$
- lr: learning rate.
- decay: learning rate decay.
- rho: moving averages parameter.
- epsilon:
$\epsilon$ value.
-
Adam(lr=0.001, decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-7, *args, **kwargs) $$ V_{dw} = rho * V_{dw} + (1 - rho) * dw \ V_{db} = rho * V_{db} + (1 - rho) * db \ S_{dw} = rho * S_{dw} + (1 - rho) * d_w^2 \ S_{db} = rho * S_{db} + (1 - rho) * d_b^2 \ V_{dw}^{corrected} = \frac{V_{dw}}{1 - \beta_1^t} \ V_{db}^{corrected} = \frac{V_{db}}{1 - \beta_1^t} \ S_{dw}^{corrected} = \frac{S_{dw}}{1 - \beta_2^t} \ S_{db}^{corrected} = \frac{S_{db}}{1 - \beta_2^t} \ w = w - lr * \frac{V_{dw}^{corrected}}{S_{dw}^{corrected}} \ b = b - lr * \frac{V_{db}^{corrected}}{S_{db}^{corrected}} $$
-
lr: learning rate.
-
decay: learning rate decay.
-
beta1:
$\beta_1$ value. -
beta2:
$\beta_2$ value. -
epsilon:
$\epsilon$ value.
-
-
MeanSquaredError
- loss =
$\frac{1}{2}(y - \hat y)^2$
- loss =
-
MeanAbsoluteError
- loss =
$|y - \hat y|$
- loss =
-
BinaryCrossEntropy
- loss =
$-ylog\hat y -(1-y)log(1 - \hat y)$
- loss =
-
SparseCrossEntropy
- loss =
$-\sum \limits_{c=1}^C y_clog\hat y_c$ -
$y_c$ should be one-hot vector.
- loss =
-
CrossEntropy
- loss =
$-\sum \limits_{c=1}^C y_clog\hat y_c$ -
$y_c$ can not be ont-hot vector.
- loss =