Page Comparison

You will not only test your own bot though but also In the small user study at the end of the project, you will test a bot an agent from another group team and another group team will test yours! More information can be found on the following pages about how to test each bot: As you can see in the 2023:Course Schedule there is one a Wednesday reserved as a Data Collection Day. in the last week of the project for running a user study.

Preparation: By this time, you should have a bot an agent that is test ready(that does not mean you cannot improve upon it later of course)!

For your bot to be test ready:

...

It should run…

...

It should be able to filter recipes by at least one filter

...

ready for final evaluation. For your agent to be ready for the user study evaluation, you need to make sure that:

It runs…
It is able to filter recipes.
Furthermore, it provides brief instructions and an overview of the botagent's features and capabilities (especially extensions)

...

in particular, you should probably mention some of the extensions that you implemented) on the start page.

You also need to specify the procedure of your user study. See https://www.simplypsychology.org/research-report.html#method for more on this, and how to report your findings.

Organization: For the user study day:

We will pair teams. The members of the team that you are paired with are the participants inyour user study.
You need to invite team members from the team you are paired with, your participants, for your own user study:
- Invite participants individually to interact with your agent at different time slots.
- Each participant that you invite interacts with your agent. Explain the procedure of your user study.
- Make sure they each have at least three interaction sessions (conversations) with your agent.
- Collect and record the data of each interaction. Data can be log files of your agent, data that Dialogflow collects, and observations that you make of how a participant interacts with your agent.

Note
Vice versa, make sure you comply with the procedure of the user study conducted by another team! In other words, take testing another team’s agent seriously! It is the proper thing to do. But we will also take how you conducted yourself as a participant into account in our final assessment.

Setting up your user study: Your team will run the user study for your own agent and collect what you think is relevant data to analyze the performance of your agent. Consider some of the metrics that were already explained in the Agent Testing section: effectiveness, efficiency, robustness, and user satisfaction. So think about what you want to test evaluate and how to do that. Also, consider that the greatest advantage of having another team test your bot agent is that they could take different routes/approaches to conversations.

On this day we will:

Pair Teams Together
Team 1: Invites other Team 2 Members to test individually
Team 1 Gather and Record Data of Team 1 Bot
Team 2: Invites other Team 1 Members to test individually
Team 2 Gather and Record Data of Team 2 Bot

...

likely will interact differently and may have different conversations with your agent than you have seen before in your own agent testing (that’s why it makes sense to already involve people from outside your team during agent testing).

Reporting: In your Final Report you should (very) briefly report on the setup of your user study but most importantly focus on the analysis of the data that you collected. You should have collected data for close to 20 conversational interactions with your agent. For these conversations, you can report some of the more interesting and basic https://en.wikipedia.org/wiki/Descriptive_statistics. Descriptive statistics provide simple summaries about the sample (your participants). You do not need to provide figures or tables (you don’t have the space in your report! If you want to provide details you can add these in an appendix to your report) but should rather focus on the more interesting findings. Most importantly, briefly discuss and interpret the data that you collected to explain what the data can tell us about the performance of your agent.

Versions Compared

Old Version 6

New Version Current

Key