quartznet-15x5-en¶
Use Case and High-Level Description¶
QuartzNet model performs automatic speech recognition. QuartzNet’s design is based on the Jasper architecture, which is a convolutional model trained with Connectionist Temporal Classification (CTC) loss. This particular model has 15 Jasper blocks each repeated 5 times. The model was trained in NeMo on multiple datasets: LibriSpeech, Mozilla Common Voice, WSJ, Fisher, Switchboard, and NSC Singapore English. For details see repository, paper.
Specification¶
Metric |
Value |
---|---|
Type |
Speech recognition |
GFLOPs |
2.4195 |
MParams |
18.8857 |
Source framework |
PyTorch* |
Accuracy¶
Metric |
Value |
---|---|
WER @ Librispeech test-clean |
3.86% |
Input¶
Original model¶
Normalized Mel-Spectrogram of 16kHz audio signal, name - audio_signal
, shape - 1, 64, 128
, format is B, N, C
, where:
B
- batch sizeN
- number of mel-spectrogram frequency binsC
- duration
Converted model¶
The converted model has the same parameters as the original model.
Output¶
Original model¶
Per-frame probabilities (after LogSoftmax) for every symbol in the alphabet, name - output
, shape - 1, 64, 29
, output data format is B, N, C
, where:
B - batch size
N - number of audio frames
C - alphabet size, including the CTC blank symbol
The per-frame probabilities are to be decoded with a CTC decoder.
The alphabet is: 0 = space, 1…26 = “a” to “z”, 27 = apostrophe, 28 = CTC blank symbol. Example is provided at <omz_dir>/demos/speech_recognition_deepspeech_demo/python/default_alphabet_example.conf
.
Converted model¶
The converted model has the same parameters as the original model.
Download a Model and Convert it into OpenVINO™ IR Format¶
You can download models and if necessary convert them into OpenVINO™ IR format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
Demo usage¶
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
Legal Information¶
The original model is distributed under the
Apache License, Version 2.0.
A copy of the license is provided in <omz_dir>/models/public/licenses/APACHE-2.0.txt
.