Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

System testing comprises a set of checks that you as developers of the agent do to identify how well the agent and its components operate when exposed to diverse input. This is done in a continuous manner when developing your bot, by simply engaging in conversations with the agent and analyzing its functionality. You should use your bot a ton (at least 10 different conversations per person), and try and vary the conversations you have. You should also try and trigger each pattern, page, intent, and filter to check if it all works. As you improve it and add things you should perpetually re-test your conversational agent. We recommend that each team member should try and test other sections' parts.

...

The testing is most effective if you do this in a systematic way (for example, deliberately using different utterances to trigger particular intents). Part of your end report will be about how you did this (see 2023: End Report for more info). Hereby a list of things we think you should keep in mind during testing and should be included somewhere in your end report:

  • Capabilities What capabilities of your botwhat agent do you identify?

  • Which of those do you think is are most important to test in this phase?

  • Test set up

  • What an example conversation should look likeWhat did you test and how?

  • mismatch conversation analysis

  • how What should an example conversation look like?

  • How did your test go: good and ? What went well and what went bad (focus on the bad and why it went wrong)?

  • how could one fix problems. will you improve it before turn in or is it not feasible?

  • during use, what kind of extensions do you think could be useful or improve the performance

    how could one even further extend

    What problems came up from the tests, and how can they be fixed? Which problems were feasible to fix before doing the user test? What problems were not?

  • What possible extensions came up from the tests? Which of those were feasible to implement before doing the user test?