Using Azure Speech-to-Text and Windows Command Line to convert voice to text
I recently needed to transcribe some recorded interviews to text. I tried doing this by listening to the interview and typing along with it. This was way more of an challenge than I expected it to be. I had to pause and rewind the interview multiple times and I struggled making proper sentences of spoken language.
Online services that transcribe speech to text are often expensive or only accept files that have speech in the English language. This was an obstacle as the recorded interviews were in Dutch.
When looking for an solution I stumbled upon Microsoft Azure’s Speech-To-Text service. This service supports many languages, is fairly accurate and is quite cheap or even free if you are new to Azure.
To use the service you need to sign-up or log-in to your azure account and create an Speech service as part of the Cognitive services. You can do this by searching for ‘cognitive services’ in the search bar and adding the ‘Speech’ service to it.
When added to your Azure account you need to take not of the region and the Subscription ID. This information is needed to authenticate to the API.
To enable the commands that are needed to communicate with the Azure service you need install some dependencies. The first dependency is Microsoft Visual C++ Redistributable for Visual Studio 2019. The second one is .NET Core 3.1 SDK.
When installed you can use the following PowerShell command to install the speech CLI.
dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
To check if the installation of the speech CLI was successful you can use the following command on the Windows Command Line. The output should be information on how to use the commands that are part of the speech CLI.
To prepare the speech CLI to talk to the Azure service you need to feed it the Azure region and the subscription ID. You can do this setting values directly in the speech CLI.
spx config recognize @region --set eastus
spx config recognize @key --set xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
In my case I had an .WAV file containing the interview in Dutch that I had to transcribe in Dutch and output it to a file. To do this I used the following command.
spx recogniz --file C:\Path\To\Your\File.wav --output file C:\Path\To\Your\Output\File.tsv --source nl-NL
You can customize the input, output and source language to your needs. The output file is a simple, tab-separated text file that can be edited with any text editor.