Content Comparison

This tutorial will show shows you how to transcribe the audio from a file on your computer using dialogflowDialogflow. Dialogflow was made to be used for conversations, but as it sends a transcription of what was said we can use it to transcribe audio as well.

Follow the Getting started

You should now have the following set up at the end:

SIC is installed on your laptop
Redis is running on your laptop

To play the audio, PyAudio needs to be installed. Check out https://pypi.org/project/PyAudio/ to install.

Info

NOTE: This is not necessarily the best way to transcribe audio.
This demo is here for two reasons:

To provide a platform independent way to test your Dialogflow setup
To demonstrate how to work with audio and more complex Dialogflow setups in the framework

In addition to the given preliminaries, you will also need to have PyAudio installed on your virtual environment.

Approach

This tutorial will show you how to convert audio to text. We’ll split this up into a couple parts

Converting your an audio file to a .wav file
Starting the dialogflow Dialogflow component
Transcribing the audio file

Converting to

...

.wav format

To be able to read the audio in python, its easiest to convert it to a .wav file. Depending on which file type you have this might need to be done differently, but here is an example using ffmpeg. Make sure to convert it to mono 16bit PCM little-endian audio (this is what pcm_s16le means).

Code Block
ffmpeg -i my_audio.mp3 -codec:a pcm_s16le -ac 1 -ar 44100 my_audio.wav

Installing and starting

...

Dialogflow

To start dialogflow, you will likely need to install additional packages. You can do this with

Code Block
pip install -r sic_framework/services/dialogflow/requirements.txt

Now that we have everything for dialogflow installed, we can start the component.

Code Block
cd framework/sic_framework/services/dialogflow python3 dialogflow_service.py

...

First, start the SIC Dialogflow service, you should see something like this:

Code Block

(base) user@laptop:~/framework/sic_framework/services/dialogflow$ python3 dialogflow_service.py 
[SICComponentManager 192.168.0.181]: INFO: Manager on device 192.168.0.181 starting
[SICComponentManager 192.168.0.181]: INFO: Starting component manager on ip "192.168.0.181" with components:
[SICComponentManager 192.168.0.181]: INFO:  - DialogflowService

...

Getting a key

If you have trouble installing dialogflow locally, you can also try to start the component using docker. Make sure redis is not running anywhere else, and in the framework folder use

Code Block
docker compose up dialogflow

Getting a key

To create the Google Cloud Dialogflow ES platform service account credential, perform the following steps:

...

In the Google Cloud Platform console, create a new project and then create a service account for the project.

...

Grant the following roles to the service account:

Dialogflow API Client
Dialogflow API Reader

...

don’t already have a key, check out Getting a google dialogflow key

If everything went right, you should have have a your_dialogflow_key.json with similar content:

...

title	Json key content

...

Transcribing the audio

Alright! Now that we have everything set up we can start transcribing the audio.

...

The dialogflow component is running
Your You have a dialogflow key in the folder you are working in
A .wav audio file in the folder you are working in

In a new python file (or check out TODO) copy the following code:

Code Block

import jsonthreading
import threadingpyaudio
import wave

import pyaudiojson

from sic_framework.core.message_python2 import AudioMessage
from sic_framework.services.dialogflow.dialogflow_service import DialogflowConf, GetIntentRequest, Dialogflow, \
    StopListeningMessage, QueryResult, RecognitionResult

To read the wave file we can use the python wave library. This will read the file as raw bytes, which is what dialogflow will expect from us.

Code Block

# Read the wav file

wavefile = wave.open('office_top_short.wav', 'rb')
samplerate = wavefile.getframerate()

print("Audio file specs:")
print("  sample rate:", wavefile.getframerate())
print("  length:", wavefile.getnframes())
print("  data size in bytes:", wavefile.getsampwidth())
print("  number of chanels:", wavefile.getnchannels())
print()

Now we get to more interesting stuff. The Dialogflow component will send back a lot of information, so we will have to handle that, and extract the transcription.

First, we’ll create an event. We’ll set this event whenever Dialogflow has detected the end of a sentence. That way we can ask Dialogflow to listen to the next immediately after. Its easiest to use a threading.Event, because Dialogflow will signal the end of a sentence at an arbitrary point.

The on_dialog function handles setting this event. It also will print the partial transcript intermittently and once dialogflow has chosen a final transcript we’ll add this to the list.

Code Block

# set up the callback and variables to contain the transcript results
# Dialogflow is not made for transcribing, so we'll have to work around this by "faking" a conversation

dialogflow_detected_sentence = threading.Event()
transcripts = []


def on_dialog(message):
    if message.response:
        t = message.response.recognition_result.transcript
        print("\r Transcript:", t, end="")

        if message.response.recognition_result.is_final:
            transcripts.append(t)
            dialogflow_detected_sentence.set()

Now we can set up dialogflow. We do this by first reading in our json key

Code Block
# read you keyfile and connect to dialogflow keyfile_json = json.load(open("path/to/your_keyfile_here.json"))

And then we can create a configuration for the dialogflow component. Make sure to set the proper sample rate!

Code Block
conf = DialogflowConf(keyfile_json=keyfile_json, sample_rate_hertz=samplerate, ) dialogflow = Dialogflow(conf=conf)

We’ll direct the output message’s produced by dialogflow to the on_dialog function by registering it as a callback.

Code Block
dialogflow.register_callback(on_dialog) # OPTIONAL: set

To get a sense of what dialogflow is hearing, we’ll also play the sound on our own speakers.

Code Block
# Set up output device to play audio along transcript p = pyaudio.PyAudio() output = p.open(format=pyaudio.paInt16, channels=1, rate=samplerate, output=True)

With everything set up, we can start to ask dialogflow to detect a sentence! We do this using dialogflow.request(GetIntentRequest(), block=False). Non blocking is important here, because we need to keep sending audio (and not wait for some result, which will not occur because no audio is sent). Every time dialogflow detects a sentence, we ask it to listen for the next one!

Code Block



# To make dialogflow listen to the audio, we need to ask it to "listen for intent".
# This means it will try to determine what the intention is of what is being said by the person speaking.
# Instead of using this intent, we simply store the transcript and ask it to listen for intent again.

print("Listening for first sentence")
dialogflow.request(GetIntentRequest(), block=False)

# send the audio in chunks of one second
for i in range(wavefile.getnframes() // wavefile.getframerate()):

    if dialogflow_detected_sentence.is_set():
        print()
        dialogflow.request(GetIntentRequest(), block=False)

        dialogflow_detected_sentence.clear()

    # grab one second of audio data
    chunk = wavefile.readframes(samplerate)

    output.write(chunk)  # replace with time.sleep to not send audio too fast if not playing audio

    message = AudioMessage(sample_rate=samplerate, waveform=chunk)
    dialogflow.send_message(message)

When we’re done we’ll write the output to a file and clean up dialogflow.

Code Block

dialogflow.send_message(StopListeningMessage())

print("\n\n")
print("Final transcript")
print(transcripts)

with open('transcript.txt', 'w') as f:
    for line in transcripts:
        f.write(f"{line}\n")

output.close()
p.terminate()

And run Thats the code! Run your file like so:

Code Block
cd sic_framework/tests python3 demo_transcribe_with_dialogflow.py

...

And the transcript should be stored in transcript.txt!

When the transcript is done, you might get some errors about google not receiving new request, like

google.api_core.exceptions.InvalidArgument: 400 Did not receive any new request for 1m.

We are still working on how to properly end a conversation after it is done, but Google’s documentation has little mention of this. If you find it, let us know!

Version	Old Version 2	New Version Current
Changes made by	Thomas Wiggers	Koen Hindriks
Saved on	Jul 11, 2023	Dec 09, 2024

Versions Compared

Key

Approach

Converting to

.wav format

Installing and starting

Dialogflow

Getting a key

Getting a key

Transcribing the audio