Home Paper note - [Week 6]
Post
Cancel

Paper note - [Week 6]

Paper note - [Week 6]

A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model

Motivation

  • Voice technology is currently employed in many industries, allowing businesses and consumers to facilitate digitization and automation.

  • Speech recognition is one of the most challenging computer science topics due to the difficulties of separating similar phonetically sentences and smearing problems.

Contribution

  • It proposes a stacked five layers of customized ResNets and seven layers of Bi-GRUs, each including a layer normalization based on a learnable element-wise affine parameters approach without the requirement of external language models.

  • The inclusion of the Gaussian error linear unit (GELU) layer and the dense and dropout layers for the classification tasks showed its worthiness in performance enhancement.

  • It demonstrates that the volume of the training data significantly affects the model’s output.

Method

image

Mel spectrogram

image

Residual neural network

image

Bi-directional Gated Recurrent Units

image

image

Speech recognition model

image

Experiments

image

Conclusion

image

This post is licensed under CC BY 4.0 by the author.