This tutorial will show you how to transcribe the audio from a file on your computer using dialogflow. Dialogflow was made to be used for conversations, but as it sends a transcription of what was said we can use it to transcribe audio as well.
Follow the Getting started
You should now have the following set up at the end:
SIC is installed on your laptop
Redis is running on your laptop
To play the audio, PyAudio needs to be installed. Check out https://pypi.org/project/PyAudio/ to install.
Approach
This tutorial will show you how to convert audio to text. We’ll split this up into a couple parts
Converting your file to a
.wav
fileStarting the dialogflow component
Transcribing the audio file
Converting to wave format
To be able to read the audio in python, its easiest to convert it to a .wav
file. Depending on which file type you have this might need to be done differently, but here is an example using ffmpeg. Make sure to convert it to mono 16bit PCM little-endian audio (this is what pcm_s16le
means).
ffmpeg -i my_audio.mp3 -codec:a pcm_s16le -ac 1 -ar 44100 my_audio.wav
Installing and starting dialogflow
To start dialogflow, you will likely need to install additional packages. You can do this with
pip install -r sic_framework/services/dialogflow/requirements.txt
Now that we have everything for dialogflow installed, we can start the component.
cd framework/sic_framework/services/dialogflow python3 dialogflow_service.py
If everything went right, you should see something like
(base) user@laptop:~/framework/sic_framework/services/dialogflow$ python3 dialogflow_service.py [SICComponentManager 192.168.0.181]: INFO: Manager on device 192.168.0.181 starting [SICComponentManager 192.168.0.181]: INFO: Starting component manager on ip "192.168.0.181" with components: [SICComponentManager 192.168.0.181]: INFO: - DialogflowService
Getting a key
To create the Google Cloud Dialogflow ES platform service account credential, perform the following steps:
In the Google Cloud Platform console, create a new project and then create a service account for the project.
Grant the following roles to the service account:
Dialogflow API Client
Dialogflow API Reader
Create a service account key and download the JSON version of it.
If everything went right, you should have have a your_dialogflow_key.json
with similar content:
Transcribing the audio
Alright! Now that we have everything set up we can start transcribing the audio.
Just to be sure, make sure you have:
The dialogflow component is running
Your dialogflow key in the folder you are working in
A
.wav
audio file in the folder you are working in
In a new python file (or check out TODO) copy the following code:
import json import threading import wave import pyaudio from sic_framework.core.message_python2 import AudioMessage from sic_framework.services.dialogflow.dialogflow_service import DialogflowConf, GetIntentRequest, Dialogflow, \ StopListeningMessage # Read the wav file wavefile = wave.open('office_top_short.wav', 'rb') samplerate = wavefile.getframerate() print("Audio file specs:") print(" sample rate:", wavefile.getframerate()) print(" length:", wavefile.getnframes()) print(" data size in bytes:", wavefile.getsampwidth()) print(" number of chanels:", wavefile.getnchannels()) print() # set up the callback and variables to contain the transcript results # Dialogflow is not made for transcribing, so we'll have to work around this by "faking" a conversation dialogflow_detected_sentence = threading.Event() transcripts = [] def on_dialog(message): if message.response: t = message.response.recognition_result.transcript print("\r Transcript:", t, end="") if message.response.recognition_result.is_final: transcripts.append(t) dialogflow_detected_sentence.set() # read you keyfile and connect to dialogflow keyfile_json = json.load(open("your_keyfile_here.json")) conf = DialogflowConf(keyfile_json=keyfile_json, sample_rate_hertz=samplerate, ) dialogflow = Dialogflow(conf=conf) dialogflow.register_callback(on_dialog) # OPTIONAL: set up output device to play audio along transcript p = pyaudio.PyAudio() output = p.open(format=pyaudio.paInt16, channels=1, rate=samplerate, output=True) print("Listening for first sentence") dialogflow.request(GetIntentRequest(), block=False) for i in range(wavefile.getnframes() // wavefile.getframerate()): if dialogflow_detected_sentence.is_set(): print() dialogflow.request(GetIntentRequest(), block=False) dialogflow_detected_sentence.clear() # grab one second of audio data chunk = wavefile.readframes(samplerate) output.write(chunk) # replace with time.sleep to not send audio too fast if not playing audio message = AudioMessage(sample_rate=samplerate, waveform=chunk) dialogflow.send_message(message) dialogflow.send_message(StopListeningMessage()) print("\n\n") print("Final transcript") print(transcripts) with open('transcript.txt', 'w') as f: for line in transcripts: f.write(f"{line}\n") output.close() p.terminate()
And run your file like so
cd sic_framework/tests python3 demo_transcribe_with_dialogflow.py
The output should look something like this:
Audio file specs: sample rate: 44100 length: 4505992 data size in bytes: 2 number of chanels: 1 Component not already alive, requesting DialogflowService from manager 192.168.0.181 [DialogflowService 192.168.0.181]: INFO: Started component DialogflowService Listening for first sentence Transcript: I can't believe I started the fire Transcript: a brown Transcript: I'm taking two so I can parcel them up and eat them at my leisure later on much healthier Final transcript ["I can't believe I started the fire", ' a brown']
And the transcript should be stored in transcript.txt