Speech Recognition

Introduction

  • Speech Recognition, often called Automatic Speech Recognition (ASR) or Speech-to-Text (STT), is the capability of a machine or program to identify words spoken aloud and convert them into readable text.

  • The sound itself is actually a wave

Feature Extraction

  • Firstly, convert the original signal into digital format

  • Then, divide digital audio into different frames and extract different signal for each frames

  • To identify the pattern and feature of each frame to come up with correct phonemes (The sound unit)

WER & CER

  • WER and CER are standard to recognize the accuracy of speech recognition

Last updated