BatchNormInference¶
Versioned name : BatchNormInference-5
Category : Normalization
Short description : BatchNormInference performs Batch Normalization operation described in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift article.
Detailed Description
BatchNormInference performs the following operations on a given data batch input tensor data
:
Normalizes each activation \(x^{(k)}\) by the mean and variance.
\[\hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}\]where \(E[x^{(k)}]\) and \(Var(x^{(k)})\) are the mean and variance, calculated per channel axis of
data
input, and correspond tomean
andvariance
inputs, respectively. Additionally, \(\epsilon\) is a value added to the variance for numerical stability and corresponds toepsilon
attribute.Performs linear transformation of each normalized activation based on
gamma
andbeta
input, representing the scaling factor and shift, respectively.\[\hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}\]where \(\gamma^{(k)}\) and \(\beta^{(k)}\) are learnable parameters, calculated per channel axis, and correspond to
gamma
andbeta
inputs.
Mathematical Formulation
Let x
be a d -dimensional input, \(x=(x_{1}\dotsc x_{d})\). Since normalization is applied to each activation \(E[x^{(k)}]\), you can focus on a particular activation and omit k.
For a particular activation, consider a mini-batch \(\mathcal{B}\) of m values. BatchNormInference performs Batch Normalization algorithm as follows:
Input : Values of \(x\) over a mini-batch:
\[\mathcal{B} = \{ x_{1...m} \}\]Parameters to learn : \(\gamma, \beta\)
Output :
\[\{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}\]Mini-batch mean :
\[\mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}\]Mini-batch variance :
\[\sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}\]Normalize :
\[\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}\]Scale and shift :
\[o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )\]
Attributes :
epsilon
Description : epsilon is a constant added to the variance for numerical stability.
Range of values : a floating-point number greater than or equal to zero
Type :
float
Required : yes
Inputs
1 :
data
- A tensor of type T and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. Required.2 :
gamma
- Scaling factor for normalized value. A 1D tensor of type T with the same span asdata
channel axis. Required.3 :
beta
- Bias added to the scaled normalized value. A 1D tensor of type T with the same span asdata
channel axis. Required.4 :
mean
- Value for mean normalization. A 1D tensor of type T with the same span asdata
channel axis. Required.5 :
variance
- Value for variance normalization. A 1D tensor of type T with the same span asdata
channel axis. Required.
Outputs
1 : The result of element-wise Batch Normalization operation applied to the input tensor
data
. A tensor of type T and the same shape asdata
input tensor.
Types
T : any supported floating-point type.
Examples
Example: 2D input tensor data
<layer ... type="BatchNormInference" ...>
<data epsilon="9.99e-06" />
<input>
<port id="0"> <!-- input -->
<dim>10</dim>
<dim>128</dim>
</port>
<port id="1"> <!-- gamma -->
<dim>128</dim>
</port>
<port id="2"> <!-- beta -->
<dim>128</dim>
</port>
<port id="3"> <!-- mean -->
<dim>128</dim>
</port>
<port id="4"> <!-- variance -->
<dim>128</dim>
</port>
</input>
<output>
<port id="5">
<dim>10</dim>
<dim>128</dim>
</port>
</output>
</layer>
Example: 4D input tensor data
<layer ... type="BatchNormInference" ...>
<data epsilon="9.99e-06" />
<input>
<port id="0"> <!-- input -->
<dim>1</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
<port id="1"> <!-- gamma -->
<dim>3</dim>
</port>
<port id="2"> <!-- beta -->
<dim>3</dim>
</port>
<port id="3"> <!-- mean -->
<dim>3</dim>
</port>
<port id="4"> <!-- variance -->
<dim>3</dim>
</port>
</input>
<output>
<port id="5">
<dim>1</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</output>
</layer>