Content Comparison

Welcome to the system agent testing section!

System Agent testing comprises means performing a set of checks number of steps to evaluate your agent that you as developers of the agent do to identify , with the aim of identifying how well the your agent and its components operate when exposed to a diverse inputrange of inputs. This is done in a continuous manner when while developing your botagent, by simply engaging in conversations with the agent and analyzing its functionalitycapabilities. You should use your bot agent a ton (go for a minimum of at least 10 different , but preferably more, conversations per person), and team member per week). We recommend that each team member should try and test the work of other pair programming subteams.

Most important is that you try and vary the conversations you have. You should also try and , for example, try to trigger each intent, pattern, page, intent, and filter and aim to visit each page to check if it all works. The testing is most effective if you do this systematically (for example, deliberately using different utterances to trigger particular intents). As you improve it your agent and add things, you should perpetually continuously re-test your conversational agent . We recommend that each team member should try and test other sections' parts. Part of your end report will be about how you did this (i.e. do you know how to trigger each intent, page, etc.).

In the 2023: End Report you will see a section asking how you tested and evaluated your bot.

You should use your bot a ton(at least 10 different conversations per person). Test everything to see if it is working i.e. make sure you trigger all possibilities, patterns, pages, filters, intents etc.

Then, analyze no-match conversations in Training. Go to the Training page in Dialogflow and filter conversations at the top left, this to test its new capabilities, but also to make sure the ones you implemented before still work).

You can use different tools to inspect the performance of your agent. For example, there is a useful feature in Dialogflow that allows you to analyze the quality of intent recognition of the conversations you have during testing. There is a Training page in Dialogflow where you can filter the conversations that you had at the top left. This video provides a more thorough explanation of how to do this: Use Dialogflow Analytics & Training to Improve Your Chatbot (2021). Due to us using SIC the agent set-up in the Project MAS course, where Dialogflow is only triggered for Automatic Speech Recognition and intent recognition, the Analytics part does not apply. Things we think .

For the improvement of other components of the agent (such as patterns, visuals and Prolog predicates), you can inspect error messages and use the debugging perspective in Eclipse for the MARBEL agent. You can also test Prolog predicates in a Prolog interpreter.

Part of your final report will be about how you tested your agent (see Final Report for more info). This is a list of things you should keep in mind during testing and , some of which should be included somewhere in your End Reportfinal report:

Capabilities What capabilities of your botwhat do you think is agent did you test? How did you go about this?
Which of those capabilities were most important to test in this phase?Test set up? Why?
What an example conversation should look like
What did you test and how
mismatch conversation analysis
how did your test go: good and bad( focus on the bad and why it went wrong)
how could one fix problems. will you improve it before turn in or is it not feasible?
during use, what kind of extensions do you think could be useful or improve the performance
how could one even further extend
kinds of example conversations did you use for testing?
How did your tests go? What went well, and where did you run into problems (focus on these problems and explain why things went wrong)?
How did you handle or fix these problems? Which choices did you make to focus on what you could reasonably fix within the time frame of the project? What problems did you ignore?
What kind of extensions did you implement to address some of the issues you ran into? How did you make choices to focus on the things you as a team thought most important?

Version	Old Version 5	New Version Current
Changes made by	Kunneman, F.A. (Florian)	Koen Hindriks
Saved on	Dec 20, 2022	Jan 06, 2023

Versions Compared

Key