GELU¶
Versioned name : Gelu-7
Category : Activation function
Short description : Gaussian error linear unit element-wise activation function.
Detailed description :
Gelu operation is introduced in this article. It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
where Φ(x)
is the Cumulative Distribution Function for Gaussian Distribution.
The Gelu function may be approximated in two different ways based on approximation_mode attribute.
For erf
approximation mode, Gelu function is represented as:
For tanh
approximation mode, Gelu function is represented as:
Attributes
approximation_mode
Description : Specifies the formulae to calculate the Gelu function.
Range of values :
erf
- calculate output using the Gauss error functiontanh
- calculate output using tanh approximation
Type :
string
Default value :
erf
Required : no
Inputs :
1 : A tensor of type T and arbitrary shape. Required.
Outputs :
1 : The result of element-wise Gelu function applied to the input tensor. A tensor of type T and the same shape as input tensor.
Types
T : arbitrary supported floating-point type.
Examples
Example: tanh
approximation mode
<layer ... type="Gelu">
<data approximation_mode="tanh"/>
<input>
<port id="0">
<dim>1</dim>
<dim>128</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>128</dim>
</port>
</output>
</layer>
Example: erf
approximation mode
<layer ... type="Gelu">
<data approximation_mode="erf"/>
<input>
<port id="0">
<dim>3</dim>
<dim>7</dim>
<dim>9</dim>
</port>
</input>
<output>
<port id="1">
<dim>3</dim>
<dim>7</dim>
<dim>9</dim>
</port>
</output>
</layer>