Paper note - [Week 6]
A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
Motivation
Voice technology is currently employed in many industries, allowing businesses and consumers to facilitate digitization and automation.
Speech recognition is one of the most challenging computer science topics due to the difficulties of separating similar phonetically sentences and smearing problems.
Contribution
It proposes a stacked five layers of customized ResNets and seven layers of Bi-GRUs, each including a layer normalization based on a learnable element-wise affine parameters approach without the requirement of external language models.
The inclusion of the Gaussian error linear unit (GELU) layer and the dense and dropout layers for the classification tasks showed its worthiness in performance enhancement.
It demonstrates that the volume of the training data significantly affects the model’s output.