Content Comparison

This tutorial will show you how to transcribe the audio from a file on your computer using dialogflow. Dialogflow was made to be used for conversations, but as it sends a transcription of what was said we can use it to transcribe audio as well.

Follow the Getting started

...

SIC is installed on your laptop
Redis is running on your laptop

Info
To play the audio, PyAudio needs to be installed. Check out https://pypi.org/project/PyAudio/ to install.

Approach

This tutorial will show you how to convert audio to text. We’ll split this up into a couple parts

Converting the your file to a .wav file
Starting the dialogflow component
Transcribing the audio file

Converting to

...

wave format

To be able to read the audio in python, its easiest to convert it to a .wav file. Depending on which file type you have this might need to be done differently, but here is an example using ffmpeg. Make sure to convert it to mono 16bit PCM little-endian audio (this is what pcm_s16le means).

Code Block
ffmpeg -i my_audio.mp3 -acodeccodec:a pcm_u8s16le -ac 1 -ar 44100 my_audio.wav

Installing and starting dialogflow

...

PyAudio needs to be installed, see https://pypi.org/project/PyAudio/ for information about your platform.

Code Block
pip install -r sic_framework/services/dialogflow/requirements.txt
Info

Now that we have everything for dialogflow installed, we can start the component.

Code Block
cd framework/sic_framework/services/dialogflow python3 dialogflow_service.py

If everything went right, you should see something like

...

Expand

title	Docker alternative for dialogflow

If you have trouble installing dialogflow locally, you can also try to start the component using docker. Make sure redis is not running anywhere else, and in the framework folder use

Code Block
docker compose up dialogflow

Getting a key

To create the Google Cloud Dialogflow ES platform service account credential, perform the following steps:

In the Google Cloud Platform console, create a new project and then create a service account for the project.
Grant the following roles to the service account:
- Dialogflow API Client
- Dialogflow API Reader
Create a service account key and download the JSON version of it.

If everything went right, you should have have a your_dialogflow_key.json with similar content:

Expand

title	Json key content

Code Block

{
  "type": "service_account",
  "project_id": "test-project",
  "private_key_id": "348e1399234328ea8ase5e4799a98356ef6ab6",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhk ... lots of characters ... WrXM145A0W1Gm0jZhnI1\n-----END PRIVATE KEY-----\n",
  "client_email": "test-357@test-project.iam.gserviceaccount.com",
  "client_id": "113588749577095165520",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test-357%40test-project.iam.gserviceaccount.com"
}

Transcribing the audio

Alright! Now that we have everything set up we can start transcribing the audio.

Just to be sure, make sure you have:

The dialogflow component is running
Your dialogflow key in the folder you are working in
A .wav audio file in the folder you are working in

TODO link to page explaining how to set up dialogflow and get the key,In a new python file (or check out TODO) copy the following code:

Code Block

import json
import threading
import wave

import pyaudio

from sic_framework.core.message_python2 import AudioMessage
from sic_framework.services.dialogflow.dialogflow_service import DialogflowConf, GetIntentRequest, Dialogflow, \
    StopListeningMessage

# Read the wav file

wavefile = wave.open('office_top_short.wav', 'rb')
samplerate = wavefile.getframerate()

print("Audio file specs:")
print("  sample rate:", wavefile.getframerate())
print("  length:", wavefile.getnframes())
print("  data size in bytes:", wavefile.getsampwidth())
print("  number of chanels:", wavefile.getnchannels())
print()


# set up the callback and variables to contain the transcript results
# Dialogflow is not made for transcribing, so we'll have to work around this by "faking" a conversation

dialogflow_detected_sentence = threading.Event()
transcripts = []


def on_dialog(message):
    if message.response:
        t = message.response.recognition_result.transcript
        print("\r Transcript:", t, end="")

        if message.response.recognition_result.is_final:
            transcripts.append(t)
            dialogflow_detected_sentence.set()


# read you keyfile and connect to dialogflow
keyfile_json = json.load(open("your_keyfile_here.json"))
conf = DialogflowConf(keyfile_json=keyfile_json,
                      sample_rate_hertz=samplerate, )
dialogflow = Dialogflow(conf=conf)
dialogflow.register_callback(on_dialog)

# OPTIONAL: set up output device to play audio along transcript

p = pyaudio.PyAudio()
output = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=samplerate,
                output=True)

print("Listening for first sentence")
dialogflow.request(GetIntentRequest(), block=False)

for i in range(wavefile.getnframes() // wavefile.getframerate()):

    if dialogflow_detected_sentence.is_set():
        print()
        dialogflow.request(GetIntentRequest(), block=False)

        dialogflow_detected_sentence.clear()

    # grab one second of audio data
    chunk = wavefile.readframes(samplerate)

    output.write(chunk)  # replace with time.sleep to not send audio too fast if not playing audio

    message = AudioMessage(sample_rate=samplerate, waveform=chunk)
    dialogflow.send_message(message)

dialogflow.send_message(StopListeningMessage())

print("\n\n")
print("Final transcript")
print(transcripts)

with open('transcript.txt', 'w') as f:
    for line in transcripts:
        f.write(f"{line}\n")

output.close()
p.terminate()

And run your file like so

Code Block
cd sic_framework/tests python3 demo_transcribe_with_dialogflow.py

The output should look something like this:

Code Block

Audio file specs:
  sample rate: 44100
  length: 4505992
  data size in bytes: 2
  number of chanels: 1

Component not already alive, requesting DialogflowService from manager  192.168.0.181
[DialogflowService 192.168.0.181]: INFO: Started component DialogflowService
Listening for first sentence
 Transcript: I can't believe I started the fire
 Transcript:  a brown
 Transcript: I'm taking two so I can parcel them up and eat them at my leisure later on much healthier


Final transcript
["I can't believe I started the fire", ' a brown']

And the transcript should be stored in transcript.txt

Version	Old Version 1	New Version 2
Changes made by	Thomas Wiggers	Thomas Wiggers
Saved on	Jul 11, 2023	Jul 11, 2023

Versions Compared

Key

Approach

Converting to

wave format

Installing and starting dialogflow

Getting a key

Transcribing the audio