Agent Testing and Pilot User Study
Evaluation is an essential step to evolve your conversational recipe recommendation agent from an agent that can conduct a basic conversation that offers minimal functionality to a user to an agent that can provide a great user experience for all users. Effectiveness, efficiency, robustness, and user satisfaction are important metrics here. An agent is effective if it can guarantee task completion. Such an agent adequately assists a user with completing the task at hand. It is efficient if the number of turns needed to achieve the goal in a conversation is optimized. It is robust if it can handle all kinds of user expressions that you might reasonably expect to be used, and can handle the different directions that a conversation may take. Finally, a conversational agent is satisfying if a user experiences the conversation as satisfying.
Sources
For the analysis, looking at a range of quantitative measures can provide valuable insights. Examples of information that can be extracted from the MARBEL agent logs and used for analysis include (but are not limited to):
Number of utterances/intents per interaction
Number of fallback intents
Number of repair actions (b12, b13) per interaction (see Capability 4: Handling Unexpected Intents)
Variety in intents and entities (are there any unused intents?)
Confidence values for the intent classification (are there any patterns of specific intents having low/high confidence values?)
Interaction length (in time)
You can also inspect the Dialogflow training tool and agent validation to learn more about how well intent matching performed, and what issues your Dialogflow agent may have.
Testing
By testing, you can find out the deficiencies of the agent that need to be addressed so that it will not break down easily. This is typically done in an agile manner, starting with a simple agent that functions minimally, where rounds of tests and improvements follow one another. During development, you should focus on agent testing:
At the end of the project, you will do a small pilot user study:
You will need to document the testing and pilot user study that your team performed in your final report.