Transcribing Audio with Dialogflow
This tutorial shows you how to transcribe the audio from a file on your computer using Dialogflow. Dialogflow was made to be used for conversations, but as it sends a transcription of what was said we can use it to transcribe audio as well.
NOTE: This is not necessarily the best way to transcribe audio.
This demo is here for two reasons:
To provide a platform independent way to test your Dialogflow setup
To demonstrate how to work with audio and more complex Dialogflow setups in the framework
In addition to the given preliminaries, you will also need to have PyAudio installed on your virtual environment.
Approach
This tutorial will show you how to convert audio to text. We’ll split this up into a couple parts
Converting an audio file to a
.wav
fileStarting the Dialogflow component
Transcribing the audio file
Converting to .wav format
To be able to read the audio in python, its easiest to convert it to a .wav
file. Depending on which file type you have this might need to be done differently, but here is an example using ffmpeg. Make sure to convert it to mono 16bit PCM little-endian audio (this is what pcm_s16le
means).
ffmpeg -i my_audio.mp3 -codec:a pcm_s16le -ac 1 -ar 44100 my_audio.wav
Installing and starting Dialogflow
First, start the SIC Dialogflow service, you should see something like this:
[SICComponentManager 192.168.0.181]: INFO: Manager on device 192.168.0.181 starting
[SICComponentManager 192.168.0.181]: INFO: Starting component manager on ip "192.168.0.181" with components:
[SICComponentManager 192.168.0.181]: INFO: - DialogflowService
Getting a key
If you don’t already have a key, check out Getting a google dialogflow key
If everything went right, you should have have a your_dialogflow_key.json
Transcribing the audio
Alright! Now that we have everything set up we can start transcribing the audio.
Just to be sure, make sure you have:
The dialogflow component is running
You have a dialogflow key
A
.wav
audio file in the folder you are working in
In a new python file copy the following code:
import threading
import pyaudio
import wave
import json
from sic_framework.core.message_python2 import AudioMessage
from sic_framework.services.dialogflow.dialogflow_service import DialogflowConf, GetIntentRequest, Dialogflow, \
StopListeningMessage, QueryResult, RecognitionResult
To read the wave file we can use the python wave library. This will read the file as raw bytes, which is what dialogflow will expect from us.
Â
Now we get to more interesting stuff. The Dialogflow component will send back a lot of information, so we will have to handle that, and extract the transcription.
First, we’ll create an event. We’ll set this event whenever Dialogflow has detected the end of a sentence. That way we can ask Dialogflow to listen to the next immediately after. Its easiest to use a threading.Event, because Dialogflow will signal the end of a sentence at an arbitrary point.
The on_dialog
function handles setting this event. It also will print the partial transcript intermittently and once dialogflow has chosen a final transcript we’ll add this to the list.
Now we can set up dialogflow. We do this by first reading in our json key
And then we can create a configuration for the dialogflow component. Make sure to set the proper sample rate!
We’ll direct the output message’s produced by dialogflow to the on_dialog
function by registering it as a callback.
To get a sense of what dialogflow is hearing, we’ll also play the sound on our own speakers.
With everything set up, we can start to ask dialogflow to detect a sentence! We do this using dialogflow.request(GetIntentRequest(), block=False)
. Non blocking is important here, because we need to keep sending audio (and not wait for some result, which will not occur because no audio is sent). Every time dialogflow detects a sentence, we ask it to listen for the next one!
When we’re done we’ll write the output to a file and clean up dialogflow.
Thats the code! Run your file like so:
The output should look something like this:
And the transcript should be stored in transcript.txt
!
When the transcript is done, you might get some errors about google not receiving new request, like
google.api_core.exceptions.InvalidArgument: 400 Did not receive any new request for 1m.
We are still working on how to properly end a conversation after it is done, but Google’s documentation has little mention of this. If you find it, let us know!
Â