...
Number of utterances/intents per interaction
Number of fallback intents
Number of repair actions (b12, b13) per interaction (see https://socialrobotics.atlassian.net/wiki/spaces/PCA2/pages/2709488260/Unexpected+Intents )
Variety in intents and entities (are there any unused intents?)Confidence values for the intent classification (are there any patterns of specific intents having low/high confidence values?)
evaluation statistics described in Model Evaluation.
Interaction length (in time)
...
Testing
By testing, you can find out the deficiencies of the agent that need to be addressed so that it will not break down easily. This is typically done in an agile manner, starting with a simple agent that functions minimally, where rounds of tests and improvements follow one another. During development, you should focus on agent testing:
...