Agent Testing

Welcome to the agent testing section!

Agent testing means performing a number of steps to evaluate your agent that you as developers of the agent do, with the aim of identifying how well your agent and its components operate when exposed to a diverse range of inputs. This is done in a continuous manner while developing your agent, by engaging in conversations with the agent and analyzing its capabilities. You should use your agent a ton (go for a minimum of at least 10, but preferably more, conversations per team member per week). We recommend that each team member should try and test the work of other pair programming subteams.

Most important is that you try and vary the conversations you have. You should, for example, try to trigger each intent, pattern, and aim to visit each page to check if it all works. The testing is most effective if you do this systematically (for example, deliberately using different utterances to trigger particular intents). As you improve your agent and add things, you should continuously re-test your conversational agent (to test its new capabilities, but also to make sure the ones you implemented before still work).

You can use different tools to inspect the performance of your agent. For example, there is a useful feature in Dialogflow that allows you to analyze the quality of intent recognition of the conversations you have during testing. There is a Training page in Dialogflow where you can filter the conversations that you had at the top left. This video provides a more thorough explanation of how to do this: Use Dialogflow Analytics & Training to Improve Your Chatbot (2021). Due to the agent set-up in the Project MAS course, where Dialogflow is only triggered for Automatic Speech Recognition and intent recognition, the Analytics part does not apply.

For the improvement of other components of the agent (such as patterns, visuals and Prolog predicates), you can inspect error messages and use the debugging perspective in Eclipse for the MARBEL agent. You can also test Prolog predicates in a Prolog interpreter.

Part of your final report will be about how you tested your agent (see Final Report for more info). This is a list of things you should keep in mind during testing, some of which should be included somewhere in your final report:

What capabilities of your agent did you test? How did you go about this?
Which of those capabilities were most important to test? Why?
What kinds of example conversations did you use for testing?
How did your tests go? What went well, and where did you run into problems (focus on these problems and explain why things went wrong)?
How did you handle or fix these problems? Which choices did you make to focus on what you could reasonably fix within the time frame of the project? What problems did you ignore?
What kind of extensions did you implement to address some of the issues you ran into? How did you make choices to focus on the things you as a team thought most important?