The dialogflow
service enables the use of the Google Dialogflow platform within your application.
Dialogflow is used to translate human speech into intents (intent classification). In other words, not only does it (try to) convert an audio stream into readable text, it also classifies this text into an intent and extracts additional parameters called entities from the text, if specified. For example, an audio stream can be transcribed to the string "I am 15 years old", and classified as the intent 'answer_age' with entity 'age=15'.
In order to create a Dialogflow agent, visit https://dialogflow.cloud.google.com and log-in with a Google account of choice. Use the 'Create Agent' button at the left top to start your first project. For our framework to be able to communicate with this agent, the project ID and a keyfile are required. Press the settings icon next to your agent's name at the left top to see the Project ID. Click on the Project ID itself in order to generate the project, and then, in order to get the corresponding JSON keyfile, follow the steps given here.
The main items of interest are the Intents and the Entities. An intent is something you want to recognize from an end-user; here we will show you an example of an intent that is aimed at recognizing someone’s name.
When creating an intent, you can name it anything you like; we go with 'answer_name' here. Below 'Action and parameters', you should give the name of the intent that will actually be used in your program. Here, we also set that to 'answer_name'. Moreover, it is useful to set a context for the intent. A context is set by the requester in order to indicate that we only want to recognize this specific intent, and not another one. Usually, in a social robotics application, the kind of answer we want to get is known. We match the name of the (input)context with the name of the intent, and thus make it 'answer_name' as well. By default, Dialogflow makes the context 'stick' for 5 answers; we can fix this by changing the 5 (at the output context) to a 0. Now we arrive at the most important aspect of the intent: the training phrases. Here, you can give the kinds of input strings you would expect; from these, Dialogflow learns the model it will eventually use. You can identify part of the phrase as a parameter by double-clicking on the relevant word and selecting the appropriate entity from the list. It will then automatically appear below ‘Action and parameters' as well; the ‘parameter name’ there will be passed in the result (we use ‘name’ here). The system has many built-in entities (like 'given name'), but you can define your own entities as well (even through importing CSV files). Our complete intent example thus looks like this (note: using sys.given-name
is usually preferred):
Using the Dialogflow service
Let's create a simple service that prints the transcript and the intent detected by Dialogflow.
Start by importing the necessary SIC functionality.
from sic_framework.devices.nao import Nao from sic_framework.services.dialogflow.dialogflow_service import DialogflowService, DialogflowConf, GetIntentRequest, \ RecognitionResult, QueryResult
We can then create the SICApplication
itself. First, we create a connection to the Nao robot to be able to access its microphone (using nao.mic
after it is initialized). In order to use this intent in an application, we need to set the language, project ID (agent name), and the keyfile. To do this, we create the Dialogflow configuration object. Make sure you read the documentation at the start of this page to obtain the files and ID for your Dialogflow project.
class DemoDialogflow(SICApplication): def run(self) -> None: nao = Nao(device_id='nao', application=self) conf = DialogflowConf(keyfile="dialogflow-key.json", project_id='dialogflow-test-project-376814', sample_rate_hertz=16000)
Having done this setup, we can self.connect
to the Dialogflow service (make sure it is up and running, or you will get a timeout). The parameters inputs_to_service=[nao.mic]
and conf=conf
pass the Nao microphone as an input and our configuration to be able to authenticate to Dialogflow.
dialogflow = self.connect(DialogflowService, device_id='local', inputs_to_service=[nao.mic], conf=conf)
Finally, we need to register a callback function to act whenever Dialogflow output is available. Whenever Dialogflow detects a new word, we will receive a RecognitionResult
message. The, on_recognition_result
function simply prints the detected speech.
dialogflow.register_callback(on_recognition_result)
Now we can start actually getting intents from the user! We need to set a chat ID, with which Dialogflow identifies the conversation. This can be a random number (or the same one if you want to continue a conversation). Then, we request Dialogflow to get an intent. It will start sending the Nao’s microphone audio to Dialogflow. As you start talking, the on_recognition_result
should print interim transcripts.
Whenever you are done, and if Dialogflow successfully detected your intent, it should print it on screen! The Dialogflow agent’s response should also be printed.
chat_id = np.random.randint(10000) for i in range(25): print("-> Conversation turn", i) reply = dialogflow.request(GetIntentRequest(chat_id)) if reply.fulfillment_message: print("Reply:", reply.fulfillment_message) if reply.response.query_result.intent: print("Intent:", reply.response.query_result.intent.display_name)
Here is the definition for on_recognition_result
def on_recognition_result(message): if message.response: print(message.response.recognition_result.transcript)
To start your application, we need to include
if __name__ == '__main__': test_app = DemoDialogflow() test_app.run()
And that's it! You should now be able to talk to your robot. See also https://bitbucket.org/socialroboticshub/docker/src/v3/sic/sic_framework/tests/demo_dialogflow.py for a more complex example. Make sure to set your own agent name and keyfile path!