PaddlePaddle Image Classification with OpenVINO

This demo shows how to run a MobileNetV3 Large PaddePaddle model using OpenVINO Runtime. Instead of exporting the PaddlePaddle model to ONNX and converting to Intermediate Representation (IR) format using Model Optimizer, we can now read the Paddle model directly without conversion.

Import

# model download
from pathlib import Path
import os
import urllib.request
import tarfile

# inference
from openvino.runtime import Core

# preprocessing
import cv2
import numpy as np
from openvino.preprocess import PrePostProcessor, ResizeAlgorithm
from openvino.runtime import Layout, Type, AsyncInferQueue, PartialShape

# results visualization
import time
import json
from IPython.display import Image

Download the MobileNetV3_large_x1_0 Model

Download the pre-trained model directly from the server. More details about the pre-trained model can be found in the PaddleClas documentation below.

Source: https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/deploy/lite/readme_en.md

mobilenet_url = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar"
mobilenetv3_model_path = Path("model/MobileNetV3_large_x1_0_infer/inference.pdmodel")
if mobilenetv3_model_path.is_file():
    print("Model MobileNetV3_large_x1_0 already exists")
else:
    # Download the model from the server, and untar it.
    print("Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...")
    # create a directory
    os.makedirs("model")
    urllib.request.urlretrieve(mobilenet_url, "model/MobileNetV3_large_x1_0_infer.tar")
    print("Model Downloaded")

    file = tarfile.open("model/MobileNetV3_large_x1_0_infer.tar")
    res = file.extractall("model")
    file.close()
    if (not res):
        print(f"Model Extracted to {mobilenetv3_model_path}.")
    else:
        print("Error Extracting the model. Please check the network.")
Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...
Model Downloaded
Model Extracted to model/MobileNetV3_large_x1_0_infer/inference.pdmodel.

Define the callback function for postprocessing

def callback(infer_request, i) -> None:
    """
    Define the callback function for postprocessing

    :param: infer_request: the infer_request object
            i: the iteration of inference
    :retuns:
            None
    """
    imagenet_classes = json.loads(open("utils/imagenet_class_index.json").read())
    predictions = next(iter(infer_request.results.values()))
    indices = np.argsort(-predictions[0])
    if (i == 0):
        # Calculate the first inference time
        latency = (time.time() - start) * 1000
        print("first inference latency: {:.2f} ms".format(latency))
        for n in range(5):
            print(
                "class name: {}, probability: {:.5f}"
                .format(imagenet_classes[str(list(indices)[n])][1], predictions[0][list(indices)[n]])
            )

Read the model

OpenVINO Runtime reads the PaddlePaddle model directly.

# Intialize Inference Engine with Core()
ie = Core()
# MobileNetV3_large_x1_0
model = ie.read_model(mobilenetv3_model_path)
# get the information of intput and output layer
input_layer = model.input(0)
output_layer = model.output(0)

Integrate preprocessing steps into the execution graph with Preprocessing API

If your input data does not fit perfectly in the model input tensor additional operations/steps are needed to transform the data to a format expected by the model. These operations are known as “preprocessing”. Preprocessing steps are integrated into the execution graph and performed on the selected device(s) (CPU/GPU/VPU/etc.) rather than always executed on CPU. This improves utilization on the selected device(s).

Overview of Preprocessing API: https://docs.openvino.ai/latest/openvino_docs_OV_Runtime_UG_Preprocessing_Overview.html

filename = "../001-hello-world/data/coco.jpg"
test_image = cv2.imread(filename)
test_image = np.expand_dims(test_image, 0) / 255
_, h, w, _ = test_image.shape

# Adjust model input shape to improve the performance
model.reshape({input_layer.any_name: PartialShape([1, 3, 224, 224])})
ppp = PrePostProcessor(model)
# Set input tensor information:
# - input() provides information about a single model input
# - layout of data is "NHWC"
# - set static spatial dimensions to input tensor to resize from
ppp.input().tensor() \
    .set_spatial_static_shape(h, w) \
    .set_layout(Layout("NHWC"))
inputs = model.inputs
# Here we assume the model has "NCHW" layout for input
ppp.input().model().set_layout(Layout("NCHW"))
# Do prepocessing:
# - apply linear resize from tensor spatial dims to model spatial dims
# - Subtract mean from each channel
# - Divide each pixel data to appropriate scale value
ppp.input().preprocess() \
    .resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224) \
    .mean([0.485, 0.456, 0.406]) \
    .scale([0.229, 0.224, 0.225])
# Set output tensor information:
# - precision of tensor is supposed to be 'f32'
ppp.output().tensor().set_element_type(Type.f32)
# Apply preprocessing to modify the original 'model'
model = ppp.build()

Run Inference

Use AUTO as the device name to delegate device selection to OpenVINO. The Auto device plugin internally recognizes and selects devices from among Intel CPU and GPU depending on the device capabilities and the characteristics of the model(s) (for example, precision). Then it assigns inference requests to the best device. AUTO starts inference immediately on the CPU and then transparently shifts to the GPU (or VPU) once it is ready, dramatically reducing time to first inference.

# Check the available devices in your system
devices = ie.available_devices
for device in devices:
    device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

# Load model to a device selected by AUTO from the available devices list
compiled_model = ie.compile_model(model=model, device_name="AUTO")
# Create infer request queue
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start = time.time()
# Do inference
infer_queue.start_async({input_layer.any_name: test_image}, 0)
infer_queue.wait_all()
Image(filename=filename)
CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
first inference latency: 18.38 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
../_images/214-vision-paddle-classification-with-output_12_1.jpg

Performance Hints: Latency and Throughput

Throughput and latency are some of the most widely used metrics that measure the overall performance of an application.

  • Latency measures inference time (ms) required to process a single input or First inference.

  • To calculate throughput, divide number of inputs that were processed by the processing time.

The OpenVINO performance hints are the new way to configure the performance with the portability in mind. Performance Hints will let the device to configure itself, rather than map the application needs to the low-level performance settings, and keep an associated application logic to configure each possible device separately.

High-level Performance Hints: https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html

Run Inference with “LATENCY” Performance Hint

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better LATENCY oriented performance.

loop = 100
# AUTO sets device config based on hints
compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "LATENCY"})
infer_queue = AsyncInferQueue(compiled_model)
# implement AsyncInferQueue Python API to boost the performance in Async mode
infer_queue.set_callback(callback)
# run infernce for 100 times to get the average FPS
start = time.time()
for i in range(loop):
    infer_queue.start_async({input_layer.any_name: test_image}, i)
infer_queue.wait_all()
end = time.time()
# Calculate the average FPS
fps = loop / (end - start)
print("throughput: {:.2f} fps".format(fps))
first inference latency: 13.06 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
throughput: 116.84 fps

Run Inference with “TRHOUGHPUT” Performance Hint

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better THROUGHPUT performance.

# AUTO sets device config based on hints
compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "THROUGHPUT"})
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start = time.time()
for i in range(loop):
    infer_queue.start_async({input_layer.any_name: test_image}, i)
infer_queue.wait_all()
end = time.time()
# Calculate the average FPS
fps = loop / (end - start)
print("throughput: {:.2f} fps".format(fps))
first inference latency: 11.64 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
throughput: 117.52 fps

benchmark_app

To generate more accurate performance measurements, use the OpenVINO Benchmark Tool. we can trigger the “Performance hint” by using -hint parameter, which instructs the OpenVINO device plugin to use the best network-specific settings for latency OR throughput.

NOTE: The performance results from benchmark_app exclude model “compilation and load time”.

# 'latency': device performance optimized for LATENCY.
! benchmark_app -m $mobilenetv3_model_path -data_shape [1,3,224,224] -hint "latency"
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 44.09 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: ?
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224
[ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 143.06 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 2)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  ,
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.10 ms
[ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values!
[ INFO ] Fill input 'inputs' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 34.74 ms
[Step 11/11] Dumping statistics report
Count:          14409 iterations
Duration:       60005.49 ms
Latency:
    AVG:        4.08 ms
    MIN:        3.67 ms
    MAX:        11.74 ms
Throughput: 240.13 FPS
# 'throughput' or 'tput': device performance optimized for THROUGHPUT.
! benchmark_app -m $mobilenetv3_model_path -data_shape [1,3,224,224] -hint "throughput"
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 43.46 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: ?
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224
[ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 144.01 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 2)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  ,
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.11 ms
[ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values!
[ INFO ] Fill input 'inputs' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 35.29 ms
[Step 11/11] Dumping statistics report
Count:          14166 iterations
Duration:       60008.18 ms
Latency:
    AVG:        4.15 ms
    MIN:        3.67 ms
    MAX:        19.90 ms
Throughput: 236.07 FPS