Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) have matured to a level where it is possible to translate a user's utterances into text (ASR) and to classify text into intents (NLU) to make sense of what a user says. Also, Text-To-Speech (TTS) can be used for speech synthesis to produce well-pronounced spoken utterances from written text. Yet, conversational agents have not become mainstream, and whoever has used a home assistant (Google Home or Apple Siri) has experienced being misunderstood. These assistants are typically able to perform well on basic Question-Answering (QA) interactions, which most of the time consist of just two conversational turns: a question and an answer. However, conducting longer conversations tends to be more challenging. This is because a longer conversation can take (too) many directions, and the chance that a user says something unexpected significantly increases. We will investigate this challenge in this project.
In this project, you This course is aimed at gaining experience in building a conversational system that conducts a certain task with the end user, in this case recipe selection. Although the task seems straightforward, having the user filter on different aspects it seeks in a recipe and finally select one, you will find that there are many different things that can go wrong when different users start interacting with it. You will work on an agent that has a visual and spoken interface, and need to make sure there is an alignment between these modalities. In addition, you will implement the agent’s understanding, utterances and the management of the dialogue. Another important part of the endeavour is to test the agent in a systematic way, and act upon the findings.
You will be developing a conversational recipe recommendation agent in a team of 6 people that uses speech to interact and is able to conduct a conversation for selecting a recipe to cook. Your agent should be able to assist a user in selecting a recipe using a variety of filters. The agent does not need to be able to assist a user with the instruction steps of the recipe itself, which is out of the scope of the Project MAS Conversational Agents course. We chose to focus on the recipe selection activity since it already poses several challenges for building an effective and robust conversational agent. First, there are many different ways in which this conversation may be conducted and many different ways in which a user can phrase what they want from the agent. A user can specify different aspects of a recipe that the recipe it will finally select should satisfy (e.g., type of ingredients, cooking duration, type of course, etc.). Second, the recipe recommendation domain already is a broad knowledge space that the agent needs to be able to handle to understand what the user is looking for. The agent will have to reason over its database of recipes to filter for recipes that fit the user’s preferences.
...
Before you get started, make sure to check out the main Project Deliverables. Apart from a working agent (a MARBEL and Dialogflow agent) that you evaluatedwill evaluate, you will conduct weekly presentation-based check-ins and write a Final Report in which you describe the agent's main features and its performance based on the testing you did, amongst other things. You can also look ahead at how your work will be evaluated by checking out the Assessment Rubric.
At the end of this project, you and your team should have a fully functioning conversational agent that is able to assist users with selecting a recipe they would like to cook!
...
The https://socialrobotics.atlassian.net/wiki/spaces/PM2/overview will be your definitive guide to this project, you should go through it section by section to prepare for the course, get course information, build your agent, and then write your report.