Introduction
The dialogflow
service allows using the Google Dialogflow platform within your application. Dialogflow is used to translate human speech into intents (intent recognition). In other words, not only does it (try to) convert an audio stream into readable text, it also processes this text into an intent (possibly with some additional parameters). For example, an audio stream can translate to the string "I am 15 years old", which is, in turn, converted to the intent 'answer_age' with the parameter 'age = 15'.
Docker name: dialogflow
Input
Required sensors: Microphone
Audio input can also be provided in the form of an audio file
the audio input sent to the Google API has to be a bytestream
the audio length for one request can be maximum 1 minute, as per https://cloud.google.com/dialogflow/quotas#es-agent_2
Required actuators: None
Required service(s):
stream_audio
The following drivers need to be running if testing locally: computer-robot
, computer-speaker,
and computer-microphone
.
Output
The output solely depends on your project and the set-up of your intents and entities of the Dialogflow agent.
The output consists of a dict
:
{'intent': '[YOUR_INTENT]', 'parameters': {'[YOUR_PARAMETER]': '[PARAMETER_RESPONSE]'}, 'confidence': [CONFIDENCE_VALUE], 'text': '[RESPONSE_TEXT]', 'source': 'audio'}
intent:
str
the intent on which the audio was recognised, corresponding to the intent set on the agent
parameters:
dict
the parameters defined in the agent
each parameter is a
str
key, with the its response asstr
value pairing
confidence level:
int
number ranging from 0 to 100 that defines how confident the API is with the intent and text detection
text:
str
speech-to-text response from the API
source:
str
for the SIC framework Dialogflow usage, the source is always ‘audio’
Parameters
Dialogflow Keyfile path:
str
Dialogflow Project ID:
str
The parameters are set at BasicSICConnector
instantiation time.
Service Configuration
Our service communicates with a Dialogflow agent to achieve its intended purpose, and it does so by using a project ID and a key file. If you happen to have them, you may skip this section.
The following steps will help you get the required items:
Create a Dialogflow agent by clicking the following link: https://dialogflow.cloud.google.com
Use the ‘Create Agent' button at the left top to start your first project. Press the settings icon next to your agent's name at the left top to see the Project ID.
Follow the steps here to retrieve your private key file in JSON format.
Initialisation
Using the service
In order to use our service for your purposes, an instance of the BasicSICConnector class has to be created. You can find the details of this class here. You may also need a class to manage speech_recognition attempts and a callback function for retrieving a recognized entity from the detection result.
In order to run this service, the following steps must be taken into consideration:
You have the correct agent name and keyfile path as parameters for an instance of the class of BasicSICConnector.
You have the Dialogflow services and the relevant local devices running.
Example
The following file, https://bitbucket.org/socialroboticshub/connectors/src/master/python/speech_recognition_example.py, is available for the purpose of demonstration. Two questions are dealt with in this example. The first is an entity question where the point of interest is the name of the user. The second is a yes, no, or don’t know question.
Setting up the agent
In order to deal with the first question, an intent needs to be set up. An intent is a value recognised from an end-user. In our example, the name of the person. The following steps will set an intent of your Dialogflow agent:
Navigate to the agent’s page to set the intent, training phrases and parameters.
Create an agent intent.
It is recommended that the name suggests the kind of answer you are looking for in the audio. In our example, the name of the user (‘answer_name’).
the intent defined in the agent should correspond to the intent used in the code
Create a context.
the number next to the context corresponds to the number of responses expected from the user in that context. In our example, that number is 0
Create training phrases for the intent
the training phrases should be input examples that contain the intent. In our example, 'my name is name`
Dialogflow learns from these phrases and matches future user inputs based on them
Create parameters for the intent
select words from the training phrases as parameters by double-clicking on them, then match them with their corresponding entity. They automatically appear in the ‘Action and parameters’ section. In our example, we are only interested in the ‘name’ of the user
Our complete intent example thus looks like this (note: using sys.given-name
is usually preferred):
Events
onAudioIntent
a new intent is detected
IntentDetectionDone
a new intent has finished being detected
onAudioLanguage
the audio language has been changed
LoadAudioDone
if an audio file is used, the event is raised when the file has finished being loaded
Known Issues
There is a rare bug where sometimes Dialogflow will suddenly only respond with ‘UNAUTHENTICATED’ errors. Restarting Docker and/or your entire machine seems to be the only way to resolve this.