System testing

Welcome to the system testing section!

System testing comprises a set of checks that you as developers of the agent do to identify how well the agent and its components operate when exposed to diverse input. This is done in a continuous manner when developing your bot, by simply engaging in conversations with the agent and analyzing its functionality. You should use your bot a ton (at least 10 different conversations per person), and try and vary the conversations you have. You should also try and trigger each pattern, page, intent, and filter to check if it all works. As you improve it and add things you should perpetually re-test your conversational agent. We recommend that each team member should try and test other sections' parts.

There is a useful feature in Dialogflow that allows you to analyze the quality of intent detection of the test conversations. Go to the Training page in Dialogflow and filter conversations at the top left. This video provides a more thorough explanation of how to do this: Use Dialogflow Analytics & Training to Improve Your Chatbot (2021). Due to the agent set-up in the Project MAS course, where Dialogflow is only triggered for Automatic Speech Recognition and Intent detection, the Analytics part does not apply.

For the improvement of other components of the agent (such as patterns, visuals and prolog predicates), you can inspect error messages and debugging in MARBEL, and try out prolog predicates in a prolog interpreter.

The testing is most effective if you do this in a systematic way (for example, deliberately using different utterances to trigger particular intents). Part of your end report will be about how you did this (see 2023: End Report for more info). Hereby a list of things we think you should keep in mind during testing and should be included somewhere in your end report:

Capabilities of your bot
what do you think is most important to test in this phase?
Test set up
What an example conversation should look like
What did you test and how
mismatch conversation analysis
how did your test go: good and bad( focus on the bad and why it went wrong)
how could one fix problems. will you improve it before turn in or is it not feasible?
during use, what kind of extensions do you think could be useful or improve the performance
- how could one even further extend