...
By now(we hope) you have a functioning conversational agent! Congrats! We are sure by this time you have tried a few things out but we would like you to do some in-depth testing. We will have you use the agent for its designed purpose. You will have to use it a lot of times and try and vary the conversations you have. You should also try and trigger each pattern, page, intent, and filter to check if it all works. Afterward, you will write a quick report about how you did this(i.e. do you know how to trigger each intent, page, etc.). You will not only test your own bot though but also a bot from another group! More information can be found on the following pages about how to test each bot and how to write the report:
Evaluating Another Team's BotEvaluation is an essential step to evolve your recipe recommendation agent from an agent that can conduct a basic conversation that offers minimal functionality to a user to an agent that can provide a great user experience for all users. Effectiveness, efficiency, robustness, and user satisfaction are important metrics here. An agent is effective if it can guarantee task completion. Such an agent adequately assists a user with completing the task at hand. It is efficient if the number of turns needed to achieve the goal in a conversation is optimized. It is robust if it can handle all phrases that users might reasonably be expected to say and can handle the different directions that a conversation may take. Finally, a conversational agent is satisfying if a user experiences the conversation as satisfying.
By testing, you can find out the deficiencies of the agent that need to be addressed so that it will not break down easily. This is typically done in an agile manner, starting with a simple agent that functions minimally, where rounds of tests and improvements follow one another. You will follow two stages in the current project.
First, you will focus on agent testing:
...
Second, at the end of the project, you will do a user study:
...
You will need to document agent testing and the user study that your team performed in your final report.