Page Comparison

Welcome to the bot evaluation agent testing section!

Report

Introduction

capabilities
what do you think is most important to test in this phase?

Testing

Test set up
An example conversation
What did you test and how
mismatch conversation analysis

Evaluation

how did your test go: good and bad( focus on the bad and why it went wrong)

Improvements

how could one fix problems. will you improve it before turn in or is it not feasible?
during use, what kind of extensions do you think could be useful or improve the performance
- how could one even further extend

Use your bot a ton(at least 10 different conversations per person) and lets analyze conversations in Training. Test everything to see if it is working i.e. make sure you trigger all possibilitles, patterns, pages, filters, intents etc.

Write a short report about how you did this and then analyze no match conversations in Training (go to Training page in Dialogflow and filter conversations at top left). How in video? Due to us using SIC Analytics part does not apply.Agent testing means performing a number of steps to evaluate your agent that you as developers of the agent do, with the aim of identifying how well your agent and its components operate when exposed to a diverse range of inputs. This is done in a continuous manner while developing your agent, by engaging in conversations with the agent and analyzing its capabilities. You should use your agent a ton (go for a minimum of at least 10, but preferably more, conversations per team member per week). We recommend that each team member should try and test the work of other pair programming subteams.

Most important is that you try and vary the conversations you have. You should, for example, try to trigger each intent, pattern, and aim to visit each page to check if it all works. The testing is most effective if you do this systematically (for example, deliberately using different utterances to trigger particular intents). As you improve your agent and add things, you should continuously re-test your conversational agent (to test its new capabilities, but also to make sure the ones you implemented before still work).

You can use different tools to inspect the performance of your agent. For example, there is a useful feature in Dialogflow that allows you to analyze the quality of intent recognition of the conversations you have during testing. There is a Training page in Dialogflow where you can filter the conversations that you had at the top left. This video provides a more thorough explanation of how to do this: Use Dialogflow Analytics & Training to Improve Your Chatbot (2021). Due to the agent set-up in the Project MAS course, where Dialogflow is only triggered for Automatic Speech Recognition and intent recognition, the Analytics part does not apply.

For the improvement of other components of the agent (such as patterns, visuals and Prolog predicates)Talk about what issues you encountered with your bot and any mismatches. Talk about how you fixed them (or would fix them)., you can inspect error messages and use the debugging perspective in Eclipse for the MARBEL agent. You can also test Prolog predicates in a Prolog interpreter.

Part of your final report will be about how you tested your agent (see Final Report for more info). This is a list of things you should keep in mind during testing, some of which should be included somewhere in your final report:

What capabilities of your agent did you test? How did you go about this?
Which of those capabilities were most important to test? Why?
What kinds of example conversations did you use for testing?
How did your tests go? What went well, and where did you run into problems (focus on these problems and explain why things went wrong)?
How did you handle or fix these problems? Which choices did you make to focus on what you could reasonably fix within the time frame of the project? What problems did you ignore?
What kind of extensions did you implement to address some of the issues you ran into? How did you make choices to focus on the things you as a team thought most important?

Versions Compared

Old Version 2

New Version Current

Key

Report