Skip to content

Latest commit

 

History

History
 
 

docs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Contents

Layer

- Input(input_shape: Tuple, data: ndarray = None, **kwargs)
  • input_shape: input data's shape, for example, (C, H, W) or (features, ).
  • data: this layer's input and output tensor's value.
from xshinnosuke.models import Model
from xshinnosuke.layers import Input

X = Input(input_shape=(10, 5, 5))
model = Model(inputs=X, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- Dense(out_features: int, out_features, activation=None, use_bias=True, kernel_initializer='normal', bias_initializer='zeros', kernel_regularizer=None, **kwargs)
  • out_features: out feature numbers.
  • activation: activation function. see details in Activations
  • use_bias: whether use bias.
  • kernel_initializer: kernel initialize method. see details in Initializers
  • bias_initializer: bias initialize method. see details in Initializers
  • kernel_regularizer: not implemented.
from xshinnosuke.models import Sequential
from xshinnosuke.layers import Dense

model = Sequential()
model.add(Dense(out_features=100, input_shape=(500, ), activation='relu'))
model.add(Dense(out_features=10))
model.compile(loss='mse', optimizer='adam')
print(model)
- Flatten(start: int = 1, **kwargs)
  • start: flatten start axis, for example, a tensor with shape (N, C, H, W), if start = 1, after flatten tensor will be (N, C * H * W); if start = 2, after flatten tensor will be (N, C, H * W)
from xshinnosuke.models import Model
from xshinnosuke.layers import Flatten, Input

X_input = Input(input_shape=(10, 5, 8))
X = Flatten(start=1)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- ZeroPadding2D(pad_size: Tuple, **kwargs)
  • pad_size: for example, (1, 1), which means pad input(N, C, H, W) to (N, C, H+2, W+2).
from xshinnosuke.models import Model
from xshinnosuke.layers import ZeroPadding2D, Input

X_input = Input(input_shape=(10, 5, 5))
X = ZeroPadding2D(pad_size=(2, 2))(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- Conv2D(out_channels: int, kernel_size: Tuple, use_bias: bool = False, stride: int = 1, padding: str = 'VALID', activation = None, kernel_initializer= 'Normal', bias_initializer = 'zeros', **kwargs)
  • out_channels: filter's numbers.

  • kernel_size: filter's size. for example, (3, 3) or 3.

  • use_bias: whether use bias.

  • stride: convolution stride.

  • padding: 'SAME' or 'VALID', 'VALID' means no padding, 'SAME' means pad input to get the same output size as input.

  • activation: activation function. see details in Activations

  • kernel_initializer: kernel initialize method. see details in Initializers

  • bias_initializer: bias initialize method. see details in Initializers

    from xshinnosuke.models import Model
    from xshinnosuke.layers import Conv2D, Input
    
    X_input = Input(input_shape=(3, 24, 24))
    X = Conv2D(out_channels=16, kernel_size=(3, 3), stride=1, padding='VALID', activation='relu')(X_input)
    model = Model(inputs=X_input, outputs=X)
    model.compile(optimizer='sgd', loss='bce')
    print(model)
- MaxPooling2D(pool_size: Tuple, stride: int = None, **kwargs)
  • pool_size: pooling kernel size, for example (2, 2) means apply max pooling in every 2 x 2 area.
  • stride: pooling stride.
- AvgPooling2D(pool_size: Tuple, stride: int = None, **kwargs)
  • pool_size: pooling kernel size, for example (2, 2) means apply mean pooling in every 2 x 2 area.
  • stride: pooling stride.
from xshinnosuke.models import Model
from xshinnosuke.layers import MaxPooling2D, AvgPooling2D, Input

X_input = Input(input_shape=(3, 24, 24))
X = MaxPooling2D(kernel_size=2)(X_input)
X = AvgPooling2D(kernel_size=2)(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='bce')
print(model)
- Activation(act_name='relu')
  • act_name: activation function name, support ReLU, Sigmoid, etc. see details in Activations
from xshinnosuke.models import Model
from xshinnosuke.layers import Activation, Input

X_input = Input(input_shape=(3, 24, 24))
X = Activation('relu')(X_input)
X = Activation('sigmoid')(X)
X = Activation('softmax')(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='cross_entropy')
print(model)
- Reshape(shape: Tuple, inplace: bool = True, **kwargs)
  • shape: shape after reshape operations.
  • inplace: apply reshape on the original data directly.
from xshinnosuke.models import Model
from xshinnosuke.layers import Reshape, Input

X_input = Input(input_shape=(3, 24, 24))
X = Reshape((3, 12, 12, 4))
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='cross_entropy')
print(model)
- Dropout(keep_prob: float)
  • keep_prob: probability of keeping a unit active.
from xshinnosuke.models import Model
from xshinnosuke.layers import Dropout, Input

X_input = Input(input_shape=(500, ))
X = Dropout(keep_prob=0.5)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- Batch Normalization(epsilon=1e-6, momentum=0.9, axis=1, gamma_initializer='ones', beta_initializer='zeros', moving_mean_initializer='zeros', moving_variance_initializer='ones')
$$ u_B = \frac{1}{m} \sum \limits_{i=1}^m x_i \quad \quad mini-batch \quad mean \\ \sigma_B = \frac{1}{m} \sum \limits_{i=1}^m (x_i - u_B)^2 \quad \quad mini-batch \quad variance \\ \hat x_i = \frac{x_i - u_B}{\sqrt{\sigma_B^2 + \epsilon}} \quad \quad normalize \\ y_i = \gamma \hat x_i + \beta \quad \quad scale \quad and \quad shift $$
  • epsilon: $\epsilon$ value.
  • momentum: at training time, we use moving averages to update $u_B \rightarrow$ $moving_u = momentum * moving_u + (1 - momentum) * u_B$ and $\sigma_B \rightarrow moving_\sigma = momentum * moving_\sigma + (1 - momentum) * \sigma_B$
  • axis: use normalization on which axis, for Dense Layer, it should be 1 or -1, for Convolution Layer, it should be 1.
  • gamma_initializer: initialize $\gamma$ method. see details in Initializers
  • beta_initializer: initialize $\beta$ method. see details in Initializers
  • moving_mean_initializer: initialize $moving_u$ method. see details in Initializers
  • moving_variance_initializer: initialize $moving_\sigma$ method. see details in Initializers
- Layer Normalization(epsilon=1e-10, gamma_initializer='ones', beta_initializer='zeros')
$$ u = \frac{1}{CHW} \sum \limits_{i=1}^C \sum \limits_{j=1}^H \sum \limits_{k=1}^W x_{ijk} \quad \quad sample \quad mean \\ \sigma = \frac{1}{CHW} \sum \limits_{i=1}^C \sum \limits_{j=1}^H \sum \limits_{k=1}^W (x_{ijk} - u)^2 \quad \quad sample \quad variance \\ \hat x = \frac{x - u}{\sqrt{\sigma^2 + \epsilon}} \quad \quad normalize \\ y = \gamma \hat x + \beta \quad \quad scale \quad and \quad shift $$
  • epsilon: $\epsilon$ value.
  • gamma_initializer: initialize $\gamma$ method. see details in Initializers
  • beta_initializer: initialize $\beta$ method. see details in Initializers
- Group Normalization(epsilon=1e-5, G=16,gamma_initializer='ones', beta_initializer='zeros')
split channel into G groups, for each group, applying layer normalization separately. $$ \\ u = \frac{1}{CHW} \sum \limits_{i=1}^C \sum \limits_{j=1}^H \sum \limits_{k=1}^W x_{ijk} \quad \quad sample \quad mean \\ \sigma = \frac{1}{CHW} \sum \limits_{i=1}^C \sum \limits_{j=1}^H \sum \limits_{k=1}^W (x_{ijk} - u)^2 \quad \quad sample \quad variance \\ \hat x = \frac{x - u}{\sqrt{\sigma^2 + \epsilon}} \quad \quad normalize \\ y = \gamma \hat x + \beta \quad \quad scale \quad and \quad shift $$
  • epsilon: $\epsilon$ value.
  • G: group numbers.
  • gamma_initializer: initialize $\gamma$ method. see details in Initializers
  • beta_initializer: initialize $\beta$ method. see details in Initializers
from xshinnosuke.models import Model
from xshinnosuke.layers import BatchNormalization, LayerNormalization, GroupNormalization, Input

X_input = Input(input_shape=(16, 5, 5))
X = BatchNormalization()(X_input)
X = LayerNormalization()(X)
X = GroupNormalization()(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- Embedding(input_dim, output_dim,embeddings_initializer='uniform', mask_zero=False, **kwargs)
  • input_dim: the max size of vocabulary.
  • out_dim: after embedding dimension, for example, out_dim = E, input data (N, T) after embedding's shape is (N, T, E).
  • embeddings_initializer: embedding kernel initialize method. see details in Initializers
  • mask_zero: use masks.
from xshinnosuke.models import Sequential
from xshinnosuke.layers import Embedding

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=200, input_shape=(30, )))
model.compile(optimizer='sgd', loss='mse')
print(model)
- SimpleRNN(units, activation='tanh', initializer='glorotuniform', recurrent_initializer='orthogonal', return_sequences=False, return_state=False, stateful=False, **kwargs)

$$ z^t = W_{aa}\cdot a^{t-1} + W_{xa}\cdot x^t +b_a \\ a^t = activation(z^t) $$

  • units: rnn hidden unit numbers, for example, units = a, input data (N, T, L) after rnn will output (N, T, a).
  • activation: activation method. see details in Activations
  • initializer: $W_{xa}$ initialize method. see details in Initializers
  • recurrent_initializer: $W_{aa}$ initialize method. see details in Initializers
  • return_sequences: if True, return all timesteps a $\rightarrow$ $[a^1, a^2,..., a^t]$; if False, return the last timesteps $a^t$.
  • return_state: if True, return return_sequences' result and all timesteps a.
  • stateful: if True, use last time $a^t$ to initialize this time $a^1$; if False, use 0 to initialize this time $a^1$.
from xshinnosuke.models import Model
from xshinnosuke.layers import SimpleRNN, Input

X_input = Input(input_shape=(30, 200))
X = SimpleRNN(units=50)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- LSTM(units, activation='tanh', recurrent_activation='sigmoid', initializer='glorotuniform', recurrent_initializer='orthogonal', unit_forget_bias=True, return_sequences=False, return_state=False, stateful=False, **kwargs)

at every timesteps

$$ i^t = recurrent_activation(W_i[a^{t-1}, x^t] + b_i) \\ f^t = recurrent_activation(W_f[a^{t-1}, x^t] + b_f) \\ \tilde c^t = activation(W_c[a^{t-1}, x^t] + b_c) \\ c^t = f^t \cdot c^{t-1} + i^t \cdot \tilde c^t \\ o^t = recurrent_activation(W_o[a^{t-1}, x^t] + b_o) \\ a^t = o^t \cdot tanh(c^t) $$

  • units: lstm hidden unit numbers.
  • activation: activation method. see details in Activations
  • recurrent_activation: activation method. see details in Activations
  • initializer: $W_{xa}$ initialize method. see details in Initializers
  • recurrent_initializer: $W_{aa}$ initialize method. see details in Initializers
  • unit_forget_bias: if True, initialize $f^t$ bias $b_f$ as 1, else 0.
  • return_sequences: if True, return all timesteps a $\rightarrow$ $[a^1, a^2,..., a^t]$; if False, return the last timesteps $a^t$.
  • return_state: if True, return return_sequences' result and all timesteps a.
  • stateful: if True, use last time $a^t$ to initialize this time $a^1$; if False, use 0 to initialize this time $a^1$.
from xshinnosuke.models import Model
from xshinnosuke.layers import LSTM, Input

X_input = Input(input_shape=(30, 200))
X = LSTM(units=50, return_sequences=True)(X_input)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)
- TimeDistributed(layer, **kwargs)
  • layer: to apply time distributed layer.
from xshinnosuke.layers import Input, Dense, LSTM, TimeDistributed
from xshinnosuke.models import Model

X_input = Input(input_shape=(25, 97))
X = LSTM(units=100, return_sequences=True, stateful=True)(X_input)
X = TimeDistributed(Dense(50))(X)
model = Model(inputs=X_input, outputs=X)
model.compile(optimizer='sgd', loss='mse')
print(model)

Node

- Variable(data, in_bounds: List = None, out_bounds: List = None, name: str = None, requires_grad: bool = True, dtype: str = 'float64')
  • data: initialize value of this variable.
  • in_bounds: in_bound layer(s).
  • out_bounds: out_bound layer(s).
  • name: name of this variable.
  • requires_grad: whether requires gradient.
  • dtype: data type.
- Constant(data, in_bounds: List = None, out_bounds: List = None, name: str = None, requires_grad: bool = True, dtype: str = 'float64')
  • data: initialize value of this variable.

  • in_bounds: in_bound layer(s).

  • out_bounds: out_bound layer(s).

  • name: name of this variable.

  • requires_grad: whether requires gradient.

  • dtype: data type.

    from xshinnosuke.nn import Constant
    
    a = Constant(5)
    print('before: ', a)  
    a.data = 4
    print('after: ', a)
    # result in console
    '''
    before:  Constant(5.0, requires_grad=False)
    after:  Constant(5.0, requires_grad=False)
    UserWarning: Can not change the value of a Constant!
    '''

Optimizers

  • SGD(lr=0.01, decay=0.0, *args, **kwargs) $$ w = w - lr * dw \ b = b - lr * db $$

    • lr: learning rate.
    • decay: learning rate decay.
  • Momentum(lr=0.01, decay=0.0, rho=0.9, *args, **kwargs) $$ V_{dw} = rho * V_{dw} + (1 - rho) * dw \ V_{db} = rho * V_{db} + (1 - rho) * db \ w = w - lr * V_{dw} \ b = b - lr * V_{db} $$

    • lr: learning rate.
    • decay: learning rate decay.
    • rho: moving averages parameter.
  • RMSprop(lr=0.001, decay=0.0, rho=0.9, epsilon=1e-7, *args, **kwargs) $$ S_{dw} = rho * S_{dw} + (1 - rho) * d_w^2 \ S_{db} = rho * S_{db} + (1 - rho) * d_b^2 \ w = w - lr * \frac{dw}{\sqrt{S_{dw} + \epsilon}} \ b = b - lr * \frac{db}{\sqrt{S_{db} + \epsilon}} $$

    • lr: learning rate.
    • decay: learning rate decay.
    • rho: moving averages parameter.
    • epsilon: $\epsilon$ value.
  • Adam(lr=0.001, decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-7, *args, **kwargs) $$ V_{dw} = rho * V_{dw} + (1 - rho) * dw \ V_{db} = rho * V_{db} + (1 - rho) * db \ S_{dw} = rho * S_{dw} + (1 - rho) * d_w^2 \ S_{db} = rho * S_{db} + (1 - rho) * d_b^2 \ V_{dw}^{corrected} = \frac{V_{dw}}{1 - \beta_1^t} \ V_{db}^{corrected} = \frac{V_{db}}{1 - \beta_1^t} \ S_{dw}^{corrected} = \frac{S_{dw}}{1 - \beta_2^t} \ S_{db}^{corrected} = \frac{S_{db}}{1 - \beta_2^t} \ w = w - lr * \frac{V_{dw}^{corrected}}{S_{dw}^{corrected}} \ b = b - lr * \frac{V_{db}^{corrected}}{S_{db}^{corrected}} $$

    • lr: learning rate.

    • decay: learning rate decay.

    • beta1: $\beta_1$ value.

    • beta2: $\beta_2$ value.

    • epsilon: $\epsilon$ value.

Objectives

  • MeanSquaredError
    • loss = $\frac{1}{2}(y - \hat y)^2$
  • MeanAbsoluteError
    • loss = $|y - \hat y|$
  • BinaryCrossEntropy
    • loss = $-ylog\hat y -(1-y)log(1 - \hat y)$
  • SparseCrossEntropy
    • loss = $-\sum \limits_{c=1}^C y_clog\hat y_c$
    • $y_c$ should be one-hot vector.
  • CrossEntropy
    • loss = $-\sum \limits_{c=1}^C y_clog\hat y_c$
    • $y_c$ can not be ont-hot vector.

Activations

Initializers

Utils