Arm® CPU device¶

Introducing the Arm® CPU Plugin¶

The Arm® CPU plugin is developed in order to enable deep neural networks inference on Arm® CPU, using Compute Library as a backend.

Note

Note that this is a community-level add-on to OpenVINO™. Intel® welcomes community participation in the OpenVINO™ ecosystem and technical questions on community forums as well as code contributions are welcome. However, this component has not undergone full release validation or qualification from Intel®, and no official support is offered.

The Arm® CPU plugin is not a part of the Intel® Distribution of OpenVINO™ toolkit and is not distributed in pre-built form. To use the plugin, it should be built from source code. Plugin build procedure is described on page How to build Arm® CPU plugin.

The set of supported layers is defined on Operation set specification.

Supported inference data types¶

The Arm® CPU plugin supports the following data types as inference precision of internal primitives:

Floating-point data types:
- f32
- f16
Quantized data types:
- i8

Note

i8 support is experimental.

Hello Query Device C++ Sample can be used to print out supported data types for all detected devices.

Supported features¶

Preprocessing acceleration¶

The Arm® CPU plugin supports the following accelerated preprocessing operations:

Precision conversion:
- u8 -> u16, s16, s32
- u16 -> u8, u32
- s16 -> u8, s32
- f16 -> f32
Transposion of tensors with dims < 5
Interpolation of 4D tensors with no padding (pads_begin and pads_end equal 0).

The Arm® CPU plugin supports the following preprocessing operations, however they are not accelerated:

Precision conversion that are not mentioned above
Color conversion:
- NV12 to RGB
- NV12 to BGR
- i420 to RGB
- i420 to BGR

See preprocessing API guide for more details.

Supported properties¶

The plugin supports the properties listed below.

Read-write properties¶

All parameters must be set before calling ov::Core::compile_model() in order to take effect or passed as additional argument to ov::Core::compile_model()

ov::enable_profiling

Read-only properties¶

Known Layers Limitation¶

AvgPool layer is supported via arm_compute library for 4D input tensor and via reference implementation for another cases.
BatchToSpace layer is supported 4D tensors only and constant nodes: block_shape with N = 1 and C = 1, crops_begin with zero values and crops_end with zero values.
ConvertLike layer is supported configuration like Convert.
DepthToSpace layer is supported 4D tensors only and for BLOCKS_FIRST of mode attribute.
Equal does not support broadcast for inputs.
Gather layer is supported constant scalar or 1D indices axes only. Layer is supported as via arm_compute library for non negative indices and via reference implementation otherwise.
Less does not support broadcast for inputs.
LessEqual does not support broadcast for inputs.
LRN layer is supported axes = {1} or axes = {2, 3} only.
MaxPool-1 layer is supported via arm_compute library for 4D input tensor and via reference implementation for another cases.
Mod layer is supported for f32 only.
MVN layer is supported via arm_compute library for 2D inputs and false value of normalize_variance and false value of across_channels, for another cases layer is implemented via runtime reference.
Normalize layer is supported via arm_compute library with MAX value of eps_mode and axes = {2 | 3}, and for ADD value of eps_mode layer uses DecomposeNormalizeL2Add, for another cases layer is implemented via runtime reference.
NotEqual does not support broadcast for inputs.
Pad layer works with pad_mode = {REFLECT | CONSTANT | SYMMETRIC} parameters only.
Round layer is supported via arm_compute library with RoundMode::HALF_AWAY_FROM_ZERO value of mode, for another cases layer is implemented via runtime reference.
SpaceToBatch layer is supported 4D tensors only and constant nodes: shapes, pads_begin or pads_end with zero paddings for batch or channels and one values shapes for batch and channels.
SpaceToDepth layer is supported 4D tensors only and for BLOCKS_FIRST of mode attribute.
StridedSlice layer is supported via arm_compute library for tensors with dims < 5 and zero values of ellipsis_mask or zero values of new_axis_mask and shrink_axis_mask, for another cases layer is implemented via runtime reference.
FakeQuantize layer is supported via arm_compute library in Low Precision evaluation mode for suitable models and via runtime reference otherwise.

Arm® CPU device¶

Introducing the Arm® CPU Plugin¶

Supported inference data types¶

Supported features¶

Preprocessing acceleration¶

Supported properties¶

Read-write properties¶

Read-only properties¶

Known Layers Limitation¶

See Also¶