Dialogflow → Intent and Slot Classifier with WHISPER

While your intent and slot classifier is in development, you can use Dialogflow, a powerful Google Cloud tool, to handle Automatic Speech Recognition (ASR) and Intent and Slot Classification for your conversational agent. Dialogflow allows you to optimize your agent's understanding of user input by creating intents with training phrases and entities that define the vocabulary and natural language understanding capabilities needed for recipe recommendations.

In this project, before your intent and slot classifier is trained and tested, Dialogflow serves as the core Natural Language Understanding (NLU) component, transcribing user utterances into text, classifying them into intents, and extracting relevant information (entities) for recipe recommendations.

The Dialogflow agent acts as an NLU agent, which is connected to a MARBEL agent. The MARBEL agent receives transcripts, intents, and entities from Dialogflow and uses this information to manage the dialogue with the user. It structures interactions through conversational patterns, serving as the dialogue manager. To help you get started, we’ve provided a basic MARBEL agent, but your first task is to create your team’s Dialogflow agent. Only one team member needs to create the agent and share it with the rest of the team, as your team requires just one Dialogflow agent to collaborate effectively.

Once your custom intent and slot classifier is fully developed, it will replace Dialogflow and work in conjunction with WHISPER, an advanced ASR system, to manage the entire NLU pipeline. WHISPER will convert user speech into text, and your classifier will process the text to identify user intents and extract entities. This replacement will give you complete control over the NLU system, allowing for tailored adjustments and deeper integration into your agent's architecture. Unlike Dialogflow, your classifier can be customized specifically for recipe recommendations, enabling improved accuracy and responsiveness to user inputs.

With the replacement in place, your custom system will continue to interface with the MARBEL agent, providing it with the same intent and entity data, but with enhanced flexibility and scalability. This shift eliminates dependency on external tools like Dialogflow, ensuring better alignment with your project’s needs and granting your team full ownership of the conversational pipeline. By transitioning to your classifier and WHISPER, you’ll achieve a more robust, efficient, and adaptive solution for managing user interactions.

Create a Dialogflow agent

Please follow the instructions here: https://cloud.google.com/dialogflow/es/docs/quick/build-agent#create-an-agent.

Only one team member needs to do this.

You need a Google account for this.

Just provide a name for your Dialogflow agent. You do not need to change any other settings (you will use the default English language).

Share the Dialogflow agent with your other team members (assign them the Developer role): https://cloud.google.com/dialogflow/es/docs/access-control.
Use the zip file found in the repository to make an agent follow the steps below to make a Dialogflow agent:

Connect it to the MARBEL agent

Create and download your Dialogflow agent’s JSON key file by following the instructions here: https://cloud.google.com/iam/docs/creating-managing-service-account-keys#get-key. Note that you will have to create a service account for the project (button at the top under the blue bar). When you have done this, you can click on the email for the service account. Note that the JSON file is automatically downloaded when it is created.
All group members need to add the JSON file key to your local repository… it will not let you put it on GitHub because it is a key file. Rename the key file to dialogflow-keyfile.json, and add it under social-interaction-cloud/sic_framework/services/eis/dialogflow-keyfile.json
Retrieve the Project ID of your Dialogflow agent: you can find this by clicking on the settings (cogwheel) icon next to your Dialogflow agent’s name here: https://dialogflow.cloud.google.com/.
Open the .mas2g file in Eclipse. Then (1) add the name of the JSON key file as the value of the flowkey environment initialization parameter to the mas file, and (2) the Project ID of your Dialogflow agent as the value of the flowagent parameter.

Run the code!

You now should be able to run the code. Try it by following the instructions here: Run your Conversational Agent. You should see a start page.

A Pattern to Get Started

The dialog manager agent uses an agenda to structure the conversation with a user. An agenda consists of one or more conversational patterns; conversational patterns are templates for small sub-dialogs or mini-dialogs that can be viewed as building blocks to create a larger dialog or conversation. By inspecting this agenda, the agent can figure out what it should do and in what order things should be handled in the conversation. The agenda is managed by a MARBEL agent, which implements the dialog manager. There already is some code for initializing the agenda in the dialog_init.mod2g module. Open this file and navigate to the code line where the agenda is inserted into the agent’s database (state). As you can see, the agenda in the agent we provided to you as a starting point is still empty (it is the empty list []). We will use some of the patterns that we will create in the project to define the agent’s agenda by adding them to the list. To get started, we need to add something already to the agenda. We can use the start pattern for this which has already been defined in the patterns.pl file (check it out). This pattern just waits for the user to click a button on the screen. Below we will create such a screen where a user can click a start button to initiate the interaction. For now, let’s just add the start pattern to the agent’s agenda:

In the dialog_init module, locate the line with insert(agenda([])) and replace that with insert(agenda([start])).

Visuals

The conversational agent that we will create not only talks but also shows things using dynamic web pages that are displayed on a screen. These dynamic web pages will provide the user with some additional visual support to be able to keep track of some of the things that happened in the conversation. We want the agent, for example, to display information about where we are at in the conversation by, for example, displaying subtitles that refer to different parts (conversational patterns) of the conversation. We will also want the agent to display the preferences for or constraints on recipes that have already been added by the user and show how many recipes still fit those preferences. This will help the user understand what the agent is doing and help them remember which preferences they already indicated to the agent.

We provide a basic visual design. Thereafter it is up to you to make things look better and make additional design choices of your own.

All HTML templates for pages can be found in social-interaction-cloud/sic_framework/services/webserver/templates . Feel free to create them anyways you would like. They do have to include some information. Please see Visual Requirements for specifics.

Start Page

So let's get started! The first page that we will implement together is a start page. We will provide a detailed step-by-step guide for you on how to create it. Pages are defined by Prolog rules. The Prolog rule that we will code here for creating a start page will also serve as a kind of template example of how you can create the other pages using Prolog, HTML, and Bootstrap. The basic idea is to define a predicate page(PageName, Text, Html) which generates HTML code (a complete webpage) that is returned in the variable Html for a page named PageName; the idea behind the Text parameter is to use it for adding specific text to a page.

Each of the pages that we will create is connected to a specific conversational pattern. Most of the patterns for these pages will still need to be implemented as part of later capabilities too. But the initial pattern linked to the start page that we will add here already exists and is the start pattern that we already added to the agenda above. As a convention, we will always use the name of a conversational pattern as a name for a page that we create too. There is some logic behind this choice that we will explain below. For the start page, we therefore need to specify a rule that defines the page(start, _, Html) that will generate HTML code for a page called start.

The main requirement for the start page is that it has a button the user can click when they are ready to start a conversation. The basic idea thus is to display an HTML page at the start before a conversation has started that allows a user to start the conversation with a button click. The start page can also be used to provide some information about the agent before a user starts talking to it. The basic page layout that we will create consists of four parts: a title, some introductory text, an instruction text on when to press the start button, and, last but not least, the button itself that says “Start”.

The Prolog rule that we will create for the page(start, _, Html) has a basic template structure that we can reuse for most other pages too. It always 1: starts with checking that the active conversational pattern (at ‘top level’, but disregard that for now) has the same name (pattern ID) as the page name by using the currentTopLevel(PatternID) predicate; in our case, here we replace PatternID with start. Only if that is the case, the rule will succeed, and code for a complete HTML page will be generated. After performing this check, 2: Prolog code for constructing HTML code for the page follows. We typically organize pages in rows (often using matching Bootstrap row elements) each of which matches with one of the three parts of our page layout (title, text, button). This way of organizing our HTML pages provides a generic kind of template approach to structure any page that can be easily reused. Then 3: we use a built-in predicate atomic_list_concat(+List, -Atom) for putting the different parts (rows) together. And, finally, 4: we use a predefined predicate html(Body, false, false, Html) for generating a Bootstrap-compliant HTML page without a header and without a footer. The overall structure of our start page code then will look like this:

 %%% Page layout for start page (before conversation has started).
 page(start, _, Html) :-
    % 1: Condition for when to show this page.
		currentTopLevel(start),
	% 2: Constructing the HTLM page.
	% First row: a warning inside a Jumbotron element.
		CODE WE STILL NEED TO FIGURE OUT, SEE BELOW.
	% Second row: introductory text inside a Jumbtron element.
		CODE WE STILL NEED TO FIGURE OUT, SEE BELOW.
	% Third row: instruction and Start button inside an alert.
		CODE WE STILL NEED TO FIGURE OUT, SEE BELOW.
	% 3: Putting everything together.
		atomic_list_concat([FirstRow, SecondRow, ThirdRow], MainElementContent),
	% 4: Create the HTML page.
		html(MainElementContent, false, false, Html).

Note that in the above code template at line 15 three variables FirstRow, SecondRow, and ThirdRow have been used to refer to the first, second, and third row of our HTML code. The atomic_list_concat/2 predicate is used to concatenate these variables (as elements of a list) into the overall MainElementContent HTML code for our page. The predicate concatenates atoms which means that the terms bound to these variables should be atoms too. The next thing we need to do is to complete the code template by writing code that generates the HTML code for each of the three rows and substituting those code snippets, respectively, at lines 7, 9, and 11.

First row. Let’s first add some simple HTML code to create a heading for our page as the first row. We will straightforwardly do this by unifying the first-row variable FirstRow with an atom string containing the HTML code. We use the predefined div/4 predicate to make a https://www.w3schools.com/bootstrap4/bootstrap_jumbotron.asp element and center a large heading text within it.

div('<h1 class="text-center">Please read this first:</h1>', 'jumbotron', '', FirstRow)

Note that a string between single quotes also is an atom in SWI Prolog (if you want to know more see here). For those of you who know basic HTML the string between the quotes should look familiar: the HTML code creates a large heading text using the tags <h1> and </h1>. For those of you for whom HTML is new, you can find out more here.

Second row. For the second row, we will add a few lines of text to a paragraph and embed that again in a https://www.w3schools.com/bootstrap4/bootstrap_jumbotron.asp. We will first concatenate the lines as atomic strings (between single quotes) using the atomic_list_concat/2 predicate, we will then create a paragraph element using this content by using the predefined applyTemplate(Template, Content, Html) predicate and a simple HTML code template for a paragraph element with centered text, and, finally, add that into a Bootstrap Jumbotron element. The three-part code snippet that we are looking for is this:

atomic_list_concat([
    'You are about to interact with our agent <b>YourAgentName</b>.<br>',
	'It can help you find a recipe to your taste.'
], ParagraphContent),
applyTemplate('<p><center>~a</center></p>', ParagraphContent, ParagraphElement),
div(ParagraphElement, 'jumbotron', '', SecondRow)

We have just put some text there for you to get things moving but, of course, feel free to change it in any way you want. The main point that we want to illustrate here is that it is convenient to use the atomic_list_concat/2 predicate to concatenate a list of string atoms (lines) into a single Prolog atom. It helps you keep track and provides a nice overview of the lines of text you want to put on a page. It also avoids potential syntax issues, as using whitespaces and newlines in Prolog code may often cause problems. Check out the utils.pl file for the definition of the applyTemplate/3 predicate and the html.pl file for the definition of the div/4 predicate. The applyTemplate/3 predicate adds the few lines of text in our example code into an HTML paragraph element using the <p> and </p> tags. The <center> tag is used to center the text inside the paragraph. In our code example above, the applyTemplate/3 substitutes the (term bound to the) variable ParagraphContent for the ~a placeholder that is part of the template string. Thereafter the paragraph element is embedded in a Bootstrap alert element using the div/4 predicate. You can find out more on the Bootstrap template for alerts here. Finally, the result is unified with our SecondRow variable.

Third row. Our third row will combine an instruction text with a button with bold Start text inside of a https://www.w3schools.com/bootstrap4/bootstrap_alerts.asp element. We use the predefined button/4 predicate for creating a button element (check out the html.pl file). This predicate will allow us to specify the atom that the agent will receive when a user clicks the button. Because we want to keep things as uniform as possible, we will also use the atom 'start' (which is the same atom that we used for our pattern and webpage names). Check out the https://www.w3schools.com/bootstrap4/bootstrap_buttons.asp page to learn more about some of the class options that we added below.

button('<b>Start</b>', 'btn btn-lg btn-info', 'start', Button)

We want to use a (somewhat smaller than before) heading text for the instruction and show that before our button. We use the atomic_list_concat/2 predicate for concatenating these two elements:

atomic_list_concat(['<h3>Done reading? Then please press start to begin</h3>', Button], 
  ThirdRowContent)

And, finally, we embed this content inside an alert element, with some added classes to style the alert, using the div/4 predicate:

div(ThirdRowContent, 'alert alert-info text-center', '', ThirdRow)

We now can fill in the gaps in the Prolog rule for defining our first page using the three different code snippets for the three different parts (rows) we want to display on our start page. It remains for you to put this all together and add it to the html.pl file in the right place. Just to be sure, when you copy-paste the rows into the overall page template code, make sure to also add the commas that are still missing at the end of each of the three code snippets.

You now have created your first page that your conversational agent can display. The page does not look particularly great yet and there is still much that can be improved. How exactly you and your team organize or create your HTML pages is up to you, as long as you fulfill the minimal conditions that the page must meet. Hence, make sure to revisit this page and revise it again later during the project!

Run your agent again!

Run your agent again, see here for the instructions for Run your Conversational Agent.

You should now see the start page that we created! Don’t yet press the button…

Switch to the Debug perspective in Eclipse, select the MARBEL Debugging Engine in the Debug area, and press pause. Inspect the agent’s state and check out in particular the session/1 predicate.

Now start the agent again by pressing resume. And then press the start button on the page. Go back again to the Debug perspective and pause the agent again. Now again inspect the agent’s state and in particular the session/1 predicate. Double-click on the session/1 fact if you can't see it completely.

What has happened?

Getting Started with your MARBEL Agent