Assessment Rubric
Project MAS Rubric | |||||
Basic agent | |||||
Criteria | Poor | Average | Good | Ratings | Max Pts |
Dialogflow intents and entities | Poor or lacking implementation of intents and entities, preventing the agent from functioning. (0) | Not all intents and entities that were instructed were properly implemented, disabling certain functionalities. (3) | All instructed intents and entities were properly implemented. (5) | Â | 5 |
Robustness of intent recognition | Intents were trained with a minimum of training phrases, enabling only understanding of very specific input. (0) | Only part of the intents were implemented with a wide coverage of possible user utterances, or the implementation of the intents was not made robust enough against confusing one intent with another. (4) | Intents were trained with a wide coverage of possible user utterances and entities (if applicable), while the chance for confusion between intents was reduced to a minimum. (7) | Â | 7 |
Recipe filtering | Poor or lacking implementation of recipe filtering, preventing the agent from functioning. (0) | Not all instructed filtering functions were properly implemented, disabling certain functionalities. (4) | All instructed recipe filtering functions were properly implemented. (8) | Â | 8 |
Conversation patterns and agent responses | Poor or lacking implementation of conversational patterns and agent responses, preventing the agent from functioning. (0) | Not all conversation patterns and agent responses that were instructed were properly implemented, disabling certain functionalities. (4) | All instructed conversational patterns and agent responses were properly implemented. (8) | Â | 8 |
Visuals | Poor or lacking implementation of visuals, preventing the agent from functioning. (0) | Not all instructed visuals were properly implemented, making some of the pages unclear. (4) | All instructed pages were properly implemented, and their information is clear to the user. (8) | Â | 8 |
Total points | Â | 36 | |||
Agent extensions | |||||
Criteria | Poor | Average | Good | Ratings | Max Pts |
Task fulfillment | Little to no extensions to improve on the functionalities of the agent have been made. Extensions made hardly improve how users can find a recipe from the database. (0) | One or more extensions to improve on the functionalities of the agent have been made. They improve to some extent how users can find a recipe from the database, but were not very difficult to implement, are not all functioning well and/or are not very original. For example, extensions mostly copy a feature from the basic agent and replace one or two elements with something new. (4) | One or more extensions were made to the NLU, the filtering capacities of the agent and the way in which filtering outcomes are communicated visually and orally, that were not easy to implement and improve the way in which users can find a recipe from the database. (8) | Â | 8 |
Richness of conversation | Little to no conversational patterns and agent responses were added. The added patterns cover rare or non-relevant interaction scenario’s and do not improve on conversational repair quality, nor add to more conversational freedom in the scenario of recipe selection. (0) | The added conversational patterns cover some additional directions that the conversations may take. Little is done to make conversational repair more effective. (3) | The added conversational patterns cover a proper variety of directions that the conversations may take, and the patterns and agent responses make for effective conversational repair strategies in case of misunderstanding (see the Extensions section for some pointers). (7) |  | 7 |
Navigation and flexibillity | Little to no navigational features have been added to the agent. The features that were added are not working properly. (0) | Only some navigational features to improve on the flexibility of the agent are added (restarting, stopping or removing filters), or the added features do not all function / are unclear to the user. (3) | The agent enables the user to restart, stop and remove filters conversationally, and makes insightful to the user what options it has at any point in the conversation, by means of visuals and agent utterances. (7) | Â | 7 |
Agent design | Little to no extensions to improve on the design of the agent have been made to the visual support section or the conversation patterns and agent responses section. (0) | The agent has been given an own look and feel to some extent, but this is not very extensive and/or original, or limited to only the visuals or utterances of the agent. (4) Â | The group went all out to add an engaging and consistent look and feel to the agent visuals, utterances and conversational style. The chosen design is instrumental to recipe selection with a specific target audience in mind. (8) | Â | 8 |
Total points | Â | 30 | |||
Written report | |||||
Criteria | Poor | Average | Good | Ratings | Max Pts |
Agent overview | The description of how the agent interacts is unclear and does not support a test of the different agent features. (0) | Some, but not all features of the agent are clear from the section on how the agent interacts, which makes it difficult to test some of its features. (2) | There is a complete and clear description of how the agent interacts, which supports a proper test of the different agent features. (4) | Â | 4 |
Design rationale | The design choices are difficult to understand from how they are presented. (0) | The design choices are written down in a mostly understandable way, while part of the choices is not well-motivated. (4) | The design choices are extensive and written down in an understandable and convincing way. (7) | Â | 7 |
Agent Testing | It is unclear from the agent testing section how the testing was performed and what were the main insights. (0) | The agent testing section gives a reasonable account of how the assistant was tested and what were the main findings. (4) | There is a clear presentation of how the agent was tested throughout the project, with a clear set-of findings and follow-up actions. (8) | Â | 8 |
User Study | It is unclear from the user study section how the evaluation was set up and/or agent is performing. (0) | The user study section gives a reasonable account of how the evaluation was set up and what were the main findings. (4) | The agent and its extensions are evaluated and analyzed in a structured way and clearly presented in the user study section. (8) | Â | 8 |
Clarity and presentation | The writing style, structure and lay-out are messy and unclear. (0) | The writing style and presentation are up to standards, but parts are unclear. (4) | Very clear report in terms of writing style, structure and lay-out. (7) Â | Â | 7 |
Total points | Â | 34 |
Â