Asynchronous Inference Request¶
Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure. OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
The class has the
_pipeline
field ofstd::vector<std::pair<ITaskExecutor::Ptr, Task> >
, which contains pairs of an executor and executed task.All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
The class has the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method, which waits for
_pipeline
to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed.
Class¶
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
class TemplateAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
public:
TemplateAsyncInferRequest(const TemplateInferRequest::Ptr& inferRequest,
const InferenceEngine::ITaskExecutor::Ptr& taskExecutor,
const InferenceEngine::ITaskExecutor::Ptr& waitExecutor,
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
~TemplateAsyncInferRequest();
private:
TemplateInferRequest::Ptr _inferRequest;
InferenceEngine::ITaskExecutor::Ptr _waitExecutor;
};
Class Fields¶
_inferRequest
- a reference to the synchronous inference request implementation. Its methods are reused in theAsyncInferRequest
constructor to define a device pipeline._waitExecutor
- a task executor that waits for a response from a device about device tasks completion
Note
If a plugin can work with several instances of a device, _waitExecutor
must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
The main goal of the AsyncInferRequest
constructor is to define a device pipeline _pipeline
. The example below demonstrates _pipeline
creation with the following stages:
inferPreprocess
is a CPU compute task.startPipeline
is a CPU ligthweight task to submit tasks to a remote device.waitPipeline
is a CPU non-compute task that waits for a response from a remote device.inferPostprocess
is a CPU compute task.
TemplateAsyncInferRequest::TemplateAsyncInferRequest(const TemplateInferRequest::Ptr& inferRequest,
const InferenceEngine::ITaskExecutor::Ptr& cpuTaskExecutor,
const InferenceEngine::ITaskExecutor::Ptr& waitExecutor,
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor)
: AsyncInferRequestThreadSafeDefault(inferRequest, cpuTaskExecutor, callbackExecutor),
_inferRequest(inferRequest),
_waitExecutor(waitExecutor) {
// In current implementation we have CPU only tasks and no needs in 2 executors
// So, by default single stage pipeline is created.
// This stage executes InferRequest::Infer() using cpuTaskExecutor.
// But if remote asynchronous device is used the pipeline can by splitted tasks that are executed by cpuTaskExecutor
// and waiting tasks. Waiting tasks can lock execution thread so they use separate threads from other executor.
constexpr const auto remoteDevice = false;
if (remoteDevice) {
_pipeline = {{cpuTaskExecutor,
[this] {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin,
"TemplateAsyncInferRequest::PreprocessingAndStartPipeline");
_inferRequest->inferPreprocess();
_inferRequest->startPipeline();
}},
{_waitExecutor,
[this] {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "TemplateAsyncInferRequest::WaitPipeline");
_inferRequest->waitPipeline();
}},
{cpuTaskExecutor, [this] {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "TemplateAsyncInferRequest::Postprocessing");
_inferRequest->inferPostprocess();
}}};
}
}
The stages are distributed among two task executors in the following way:
inferPreprocess
andstartPipeline
are combined into a single task and run on_requestExecutor
, which computes CPU tasks.You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one.
waitPipeline
is sent to_waitExecutor
, which works with the device.
Note
callbackExecutor
is also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of callbackExecutor
and a callback function set by the user to the end of the pipeline.
Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the Intel® VTune™ Profiler tool.
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.
TemplateAsyncInferRequest::~TemplateAsyncInferRequest() {
InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait();
}