Creating a Speech Recognition App in Angular
We can extract text data from a speech by using speech recognition methods. There are many ways to carry out speech recognition in Angular, however, I’d like to focus on a simple method for this.
Here we use “Web Speech API” to recognize speech. Unfortunately, this API is only supported for a few browsers so I will list the supported browsers below:
- Google chrome
- Chrome for Android
- Samsung Internet
- QQ Browser
- Baidu Browser
You can test this app in a browser from this list.
Okay, let’s begin. First of all, we have to create a new Angular project by using the below command in the terminal. I assume that you have installed Angular-CLI, but if you haven't then the below command won’t work.
ng g new voice-recognition
cd ./voice-recognition
Next, we create a new service for speech recognition. This service can be reused in multiple components.
ng g s service/voice-recognition
Create Service
Here we create service files in a separate directory called “service” as a best practice. Now we call webkitSpeechRecognition API in here.
Code review
Let’s review this code. First, we declare “webkitSpeechRecognition ” as per the below line. Otherwise Angular will not recognize it because “webkitSpeechRecognition” is not a library.
declare var webkitSpeechRecognition: any;
Next, we create and a new instance of webkitSpeechRecognition and initialize it by passing values to properties of webkitSpeechRecognition.
recognition = new webkitSpeechRecognition();..........this.recognition.interimResults = true;
this.recognition.lang = 'en-US';
We enable return interim results and set language as our need. I will list the supported languages below that you can set:
- Afrikaans af
- Basque eu
- Bulgarian bg
- Catalan ca
- Arabic (Egypt) ar-EG
- Arabic (Jordan) ar-JO
- Arabic (Kuwait) ar-KW
- Arabic (Lebanon) ar-LB
- Arabic (Qatar) ar-QA
- Arabic (UAE) ar-AE
- Arabic (Morocco) ar-MA
- Arabic (Iraq) ar-IQ
- Arabic (Algeria) ar-DZ
- Arabic (Bahrain) ar-BH
- Arabic (Lybia) ar-LY
- Arabic (Oman) ar-OM
- Arabic (Saudi Arabia) ar-SA
- Arabic (Tunisia) ar-TN
- Arabic (Yemen) ar-YE
- Czech cs
- Dutch nl-NL
- English (Australia) en-AU
- English (Canada) en-CA
- English (India) en-IN
- English (New Zealand) en-NZ
- English (South Africa) en-ZA
- English(UK) en-GB
- English(US) en-US
- Finnish fi
- French fr-FR
- Galician gl
- German de-DE
- Hebrew he
- Hungarian hu
- Icelandic is
- Italian it-IT
- Indonesian id
- Japanese ja
- Korean ko
- Latin la
- Mandarin Chinese zh-CN
- Traditional Taiwan zh-TW
- Simplified China zh-CN ?
- Simplified Hong Kong zh-HK
- Yue Chinese (Traditional Hong Kong) zh-yue
- Malaysian ms-MY
- Norwegian no-NO
- Polish pl
- Pig Latin xx-piglatin
- Portuguese pt-PT
- Portuguese (brasil) pt-BR
- Romanian ro-RO
- Russian ru
- Serbian sr-SP
- Slovak sk
- Spanish (Argentina) es-AR
- Spanish(Bolivia) es-BO
- Spanish( Chile) es-CL
- Spanish (Colombia) es-CO
- Spanish(Costa Rica) es-CR
- Spanish(Dominican Republic) es-DO
- Spanish(Ecuador) es-EC
- Spanish(El Salvador) es-SV
- Spanish(Guatemala) es-GT
- Spanish(Honduras) es-HN
- Spanish(Mexico) es-MX
- Spanish(Nicaragua) es-NI
- Spanish(Panama) es-PA
- Spanish(Paraguay) es-PY
- Spanish(Peru) es-PE
- Spanish(Puerto Rico) es-PR
- Spanish(Spain) es-ES
- Spanish(US) es-US
- Spanish(Uruguay) es-UY
- Spanish(Venezuela) es-VE
- Swedish sv-SE
- Turkish tr
- Zulu zu
Then we call “event listener” for getting identified words and assign those words to the “tempWords” variable to access later.
this.recognition.addEventListener('result', (e) => {
const transcript = Array.from(e.results)
.map((result) => result[0])
.map((result) => result.transcript)
.join('');
this.tempWords = transcript;
});
Here we are going to start reviewing the next part which starts voice recognition continuously.
start() {
this.isStoppedSpeechRecog = false;
this.recognition.start();
console.log("Speech recognition started") this.recognition.addEventListener('end', (condition) => {
if (this.isStoppedSpeechRecog) {
his.recognition.stop();
console.log("End speech recognition")
} else {
this.wordConcat();
this.recognition.start();
}
});
}
We call it the “event listener” which listens to the end event. Here we recall to start voice recognition in order to run service until it’s asked to stop service. “isStoppedSpeechRecog” variable is the conditional variable that decides whether the service should be stopped or continued. Otherwise, this service will automatically stop when the user is silent.
When it comes to stopping voice recognition, the below code snippet will do it:
stop() {
this.isStoppedSpeechRecog = true;
this.wordConcat()
this.recognition.stop();
console.log("End speech recognition")
}
I wrote a method called “wordConcat()” which contacts all recognized speeches into one paragraph:
wordConcat() {
this.text = this.text + ' ' + this.tempWords + '.'; this.tempWords = '';
}
Create the component
Now we have reviewed the service file, next we have to call this service in a component. You can create a new component through using the below command:
ng g c speech-to-text
Then it will create a new typescript file and HTML file. You can enter the below code to call the service:
This is the html file.
Mind and include the below tag in app.component.html in Angular:
<app-speech-to-text></app-speech-to-text>
You can use the below command to run the app:
ng serve -o
Conclusion
Finally, you have a voice recognition Angular app! Congratulations. This link redirects to the tested app I have coded. Here are a few applications of this method:
- Analyzing user readability.
- Real-time captioning/subtitle in voice conference.
- Voice typing
Try to apply this and make value up. I look forwards to seeing you in another tutorial. Thank you for reading. Happy Coding!