openvino.inference_engine.InferRequest

class openvino.inference_engine.InferRequest

This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

__init__()

There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__

There is no explicit class constructor.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__

InferRequest.__reduce_cython__(self)

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__

InferRequest.__setstate_cython__(self, __pyx_state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_fill_inputs(self, inputs)

_get_blob_buffer(self, string blob_name)

async_infer(self[, inputs])

Starts asynchronous inference of the infer request and fill outputs array

get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

infer(self[, inputs])

Starts synchronous inference of the infer request and fill outputs array

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_batch(self, size)

Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

set_blob(self, unicode blob_name, Blob blob, ...)

Sets user defined Blob for the infer request

set_completion_callback(self, py_callback[, ...])

Description: Sets a callback function that is called on success or failure of an asynchronous request

wait(self[, timeout])

Waits for the result to become available.

Attributes

__pyx_vtable__

_inputs_list

_inputs_list: object

_outputs_list

_outputs_list: object

_py_callback

_py_callback: object

_py_data

_py_data: object

_user_blobs

_user_blobs: object

input_blobs

Dictionary that maps input layer names to corresponding Blobs

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

__class__

alias of type

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__()

There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

__init_subclass__()

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)
__pyx_vtable__ = <capsule object NULL>
__reduce__()

InferRequest.__reduce_cython__(self)

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__()

InferRequest.__setstate_cython__(self, __pyx_state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

_fill_inputs(self, inputs)
_get_blob_buffer(self, string blob_name) BlobBuffer
_inputs_list

_inputs_list: object

_outputs_list

_outputs_list: object

_py_callback

_py_callback: object

_py_data

_py_data: object

_user_blobs

_user_blobs: object

async_infer(self, inputs=None)

Starts asynchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].async_infer({input_blob: image})
request_status = exec_net.requests[0].wait()
res = exec_net.requests[0].output_blobs['prob']
get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

Note

Performance counters data and format depends on the plugin

Returns

Dictionary containing per-layer execution information.

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
exec_net.requests[0].get_perf_counts()
#  {'Conv2D': {'exec_type': 'jit_avx2_1x1',
#              'real_time': 154,
#              'cpu_time': 154,
#              'status': 'EXECUTED',
#              'layer_type': 'Convolution'},
#   'Relu6':  {'exec_type': 'undef',
#              'real_time': 0,
#              'cpu_time': 0,
#              'status': 'NOT_RUN',
#              'layer_type': 'Clamp'}
#   ...
#  }
infer(self, inputs=None)

Starts synchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
res = exec_net.requests[0].output_blobs['prob']
np.flip(np.sort(np.squeeze(res)),0)

# array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
#         5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
#         2.26027006e-03, 2.12283316e-03 ...])
input_blobs

Dictionary that maps input layer names to corresponding Blobs

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_batch(self, size)

Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

Note

Support of dynamic batch size depends on the target plugin.

Parameters

size – New batch size to be used by all the following inference calls for this request

Returns

None

Usage example:

ie = IECore()
net = ie.read_network(model=path_to_xml_file, weights=path_to_bin_file)
# Set max batch size
# net.batch = 10
ie.set_config(config={"DYN_BATCH_ENABLED": "YES"}, device_name=device)
exec_net = ie.load_network(network=net, device_name=device)
# Set batch size for certain network.
# NOTE: Input data shape will not be changed, but will be used partially in inference which increases performance
exec_net.requests[0].set_batch(2)
set_blob(self, unicode blob_name: str, Blob blob: Blob, PreProcessInfo preprocess_info: PreProcessInfo = None)

Sets user defined Blob for the infer request

Parameters
  • blob_name – A name of input blob

  • blob – Blob object to set for the infer request

  • preprocess_info – PreProcessInfo object to set for the infer request.

Returns

None

Usage example:

ie = IECore()
net = IENetwork("./model.xml", "./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=2)
td = TensorDesc("FP32", (1, 3, 224, 224), "NCHW")
blob_data = np.ones(shape=(1, 3, 224, 224), dtype=np.float32)
blob = Blob(td, blob_data)
exec_net.requests[0].set_blob(blob_name="input_blob_name", blob=blob),
set_completion_callback(self, py_callback, py_data=None)

Description: Sets a callback function that is called on success or failure of an asynchronous request

Parameters
  • py_callback – Any defined or lambda function

  • py_data – Data that is passed to the callback function

Returns

None

Usage example:

callback = lambda status, py_data: print(f"Request with id {py_data} finished with status {status}")
ie = IECore()
net = ie.read_network(model="./model.xml", weights="./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=4)
for id, req in enumerate(exec_net.requests):
    req.set_completion_callback(py_callback=callback, py_data=id)

for req in exec_net.requests:
    req.async_infer({"data": img})
wait(self, timeout=None)

Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

Parameters

timeout – Time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.

Returns

Request status code.

Note

There are special values of the timeout parameter:

  • 0 - Immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to enum_InferenceEngine_StatusCode in Inference Engine C++ documentation

  • -1 - Waits until inference result becomes available (default value)

Usage example: See InferRequest.async_infer() method of the the InferRequest class.