Welcome to the system agent testing section!
System Agent testing comprises means performing a set of checks number of steps to evaluate your agent that you as developers of the agent do to identify , with the aim of identifying how well the your agent and its components operate when exposed to a diverse inputrange of inputs. This is done in a continuous manner when while developing your botagent, by engaging in conversations with the agent and analyzing its functionalitycapabilities. You should use your bot agent a ton (go for a minimum of at least 10 different conversations per person), and , but preferably more, conversations per team member per week). We recommend that each team member should try and test the work of other pair programming subteams.
Most important is that you try and vary the conversations you have. You should also try and , for example, try to trigger each intent, pattern, page, intent, and filter and aim to visit each page to check if it all works. The testing is most effective if you do this systematically (for example, deliberately using different utterances to trigger particular intents). As you improve it your agent and add things, you should perpetually continuously re-test your conversational agent . We recommend that each team member should try and test other sections' parts. There (to test its new capabilities, but also to make sure the ones you implemented before still work).
You can use different tools to inspect the performance of your agent. For example, there is a useful feature in Dialogflow that allows you to analyze the quality of intent detection recognition of the test conversations . Go to the you have during testing. There is a Training page in Dialogflow and where you can filter the conversations that you had at the top left. This video provides a more thorough explanation of how to do this: Use Dialogflow Analytics & Training to Improve Your Chatbot (2021). Due to the agent set-up in the Project MAS course, where Dialogflow is only triggered for Automatic Speech Recognition and Intent detectionintent recognition, the Analytics part does not apply.
For the improvement of other components of the agent (such as patterns, visuals and prolog Prolog predicates), you can inspect error messages and use the debugging perspective in MARBEL, and try out prolog Eclipse for the MARBEL agent. You can also test Prolog predicates in a prolog Prolog interpreter.
The testing is most effective if you do this in a systematic way (for example, deliberately using different utterances to trigger particular intents). Part of your end final report will be about how you did this tested your agent (see 2023: End Final Report for more info). Hereby This is a list of things we think you should keep in mind during testing and , some of which should be included somewhere in your end final report:
What capabilities of your agent do you identifydid you test? How did you go about this?
Which of those do you think are capabilities were most important to test? Why?
What kinds of example conversations did you test and howuse for testing?
What should an example conversation look like?
How did your test tests go? What went well and what went bad , and where did you run into problems (focus on the bad these problems and explain why it things went wrong)?
What problems came up from the tests, and how can they be fixed? Which problems were feasible to fix before doing the user test? What problems were not?
What possible extensions came up from the tests? Which of those were feasible to implement before doing the user testHow did you handle or fix these problems? Which choices did you make to focus on what you could reasonably fix within the time frame of the project? What problems did you ignore?
What kind of extensions did you implement to address some of the issues you ran into? How did you make choices to focus on the things you as a team thought most important?