GRUCell¶

Versioned name : GRUCell-3

Category : Sequence processing

Short description : GRUCell represents a single GRU Cell that computes the output using the formula described in the paper.

Detailed description : GRUCell computes the output Ht for the current time step based on the followint formula:

Formula:
  \*  - matrix multiplication
 (.) - Hadamard product(element-wise)
 [,] - concatenation
  f, g - are activation functions.
   zt = f(Xt\*(Wz^T) + Ht-1\*(Rz^T) + Wbz + Rbz)
   rt = f(Xt\*(Wr^T) + Ht-1\*(Rr^T) + Wbr + Rbr)
   ht = g(Xt\*(Wh^T) + (rt (.) Ht-1)\*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
   ht = g(Xt\*(Wh^T) + (rt (.) (Ht-1\*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
   Ht = (1 - zt) (.) ht + zt (.) Ht-1

Attributes

hidden_size
- Description : hidden_size specifies hidden state size.
- Range of values : a positive integer
- Type : int
- Required : yes
activations
- Description : activation functions for gates
- Range of values : any combination of relu, sigmoid, tanh
- Type : a list of strings
- Default value : sigmoid for f, tanh for g
- Required : no
activations_alpha, activations_beta
- Description : activations_alpha, activations_beta functions attributes
- Range of values : a list of floating-point numbers
- Type : float[]
- Default value : None
- Required : no
clip
- Description : clip specifies value for tensor clipping to be in [-C, C] before activations
- Range of values : a positive floating-point number
- Type : float
- Default value : infinity that means that the clipping is not applied
- Required : no
linear_before_reset
- Description : linear_before_reset flag denotes if the layer behaves according to the modification of GRUCell described in the formula in the ONNX documentation.
- Range of values : true or false
- Type : boolean
- Default value : false
- Required : no

Inputs

1 : X - 2D tensor of type T [batch_size, input_size], input data. Required.
2 : initial_hidden_state - 2D tensor of type T [batch_size, hidden_size]. Required.
3 : W - 2D tensor of type T [3 \* hidden_size, input_size], the weights for matrix multiplication, gate order: zrh. Required.
4 : R - 2D tensor of type T [3 \* hidden_size, hidden_size], the recurrence weights for matrix multiplication, gate order: zrh. Required.
5 : B - 1D tensor of type T. If linear_before_reset is set to 1, then the shape is [4 \* hidden_size] - the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. Otherwise the shape is [3 \* hidden_size], the sum of biases (weights and recurrence weights). Optional.

Outputs

1 : Ho - 2D tensor of type T [batch_size, hidden_size], the last output value of hidden state.

Types

T : any supported floating-point type.

Example

<layer ... type="GRUCell" ...>
    <data hidden_size="128" linear_before_reset="1"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>16</dim>
        </port>
        <port id="1">
            <dim>1</dim>
            <dim>128</dim>
        </port>
         <port id="2">
            <dim>384</dim>
            <dim>16</dim>
        </port>
         <port id="3">
            <dim>384</dim>
            <dim>128</dim>
        </port>
         <port id="4">
            <dim>768</dim>
        </port>
    </input>
    <output>
        <port id="5">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </output>
</layer>

Prev Next