[TBU]Assessment Rubric

Change to include pass/fail stuff -midway check in presentation evaluation, github commits, rewrite report section,

change to include pass if bot can do inclusion, 7.5 if exclusion properly, then extra is extensions.

30 percent basic bot and intent classifier inclusion… 30 percent exclusion and extensions … 30 percent report
where to put testing?

Project Conversational Agents - Rubric
Basic agent
Criteria	Poor	Average	Good	Ratings	Max Pts
Dialogflow intents and entities	Poor or lacking implementation of intents and entities, preventing the agent from functioning. (0)	Not all intents and entities that were instructed were properly implemented, disabling certain functionalities. (3)	All instructed intents and entities were properly implemented. (5)		5
Robustness of intent recognition	Intents were trained with a minimum of training phrases, enabling only understanding of very specific input. (0)	Only part of the intents were implemented with a wide coverage of possible user expressions, or the implementation of the intents was not made robust enough against confusing one intent with another. (4)	Intents were trained with a wide coverage of possible user expressions and entities (if applicable), while the chance for confusion between intents was reduced to a minimum. (7)		7
Recipe filtering	Poor or lacking implementation of recipe filtering, preventing the agent from functioning. (0)	Not all instructed filtering functions were properly implemented, disabling certain functionalities. (4)	All instructed recipe filtering functions were properly implemented. (8)		8
Conversation patterns and agent responses	Poor or lacking implementation of conversational patterns and agent responses, preventing the agent from functioning. (0)	Not all conversation patterns and agent responses that were instructed were properly implemented, disabling certain functionalities. (4)	All instructed conversational patterns and agent responses were properly implemented. (8)		8
Visuals	Poor or lacking implementation of visuals, preventing the agent from functioning. (0)	Not all instructed visuals were properly implemented, making some of the pages unclear. (4)	All instructed pages were properly implemented, and their information is clear to the user. (8)		8
Total points					36
Agent extensions
Criteria	Poor	Average	Good	Ratings	Max Pts
Task fulfillment	Little to no extensions to improve on the functionalities of the agent have been made. Extensions made hardly improve how users can find a recipe from the database. (0)	One or more extensions to improve on the functionalities of the agent have been made. They improve to some extent how users can find a recipe from the database, but were not very difficult to implement, are not all functioning well and/or are not very original. For example, extensions mostly copy a feature from the basic agent and replace one or two elements with something new. (4)	One or more extensions were made to the NLU, the filtering capacities of the agent and the way in which filtering outcomes are communicated visually and orally, that were not easy to implement and improve the way in which users can find a recipe from the database. (8)		8
Conversational competence	Little to no conversational patterns and agent responses were added. The added patterns cover rare or non-relevant interaction scenario’s and do not improve on conversational repair quality, nor add to more conversational freedom in the scenario of recipe selection. (0)	The added conversational patterns cover some additional directions that the conversations may take. Little is done to make conversational repair more effective. (3)	The added conversational patterns cover a proper variety of directions that the conversations may take, and the patterns and agent responses make for effective conversational repair strategies in case of misunderstanding (see the Extensions section for some pointers). (7)		7
Navigation and flexibillity	Little to no navigational features have been added to the agent. The features that were added are not working properly. (0)	Only some navigational features to improve on the flexibility of the agent are added (restarting, stopping or removing filters), or the added features do not all function / are unclear to the user. (3)	The agent enables the user to restart, stop and remove filters conversationally, and makes insightful to the user what options it has at any point in the conversation, by means of visuals and agent utterances. (7)		7
Agent design	Little to no extensions to improve on the design of the agent have been made to the visual support section or the conversation patterns and agent responses section. (0)	The agent has been given an own look and feel to some extent, but this is not very extensive and/or original, or limited to only the visuals or utterances of the agent. (4)	The group went all out to add an engaging and consistent look and feel to the agent visuals, utterances and conversational style. The chosen design is instrumental to recipe selection with a specific target audience in mind. (8)		8
Total points					30
Written report
Criteria	Poor	Average	Good	Ratings	Max Pts
Agent overview	The description of how the agent interacts is unclear and does not support a test of the different agent features. (0)	Some, but not all features of the agent are clear from the section on how the agent interacts, which makes it difficult to test some of its features. (2)	There is a complete and clear description of how the agent interacts, which supports proper testing of the different agent features. (4)		4
Design rationale	The design choices are difficult to understand from how they are presented. (0)	The design choices are written down in a mostly understandable way, while some of the choices are not well-motivated. (4)	The design choices are clearly described and motivated and written down in an understandable and convincing way. (7)		7
Agent Testing	It is unclear from the testing section what data was collected during agent testing and what were the main insights. (0)	The testing section gives a reasonable account of how the agent was tested during development and what were the main findings. (4)	There is a clear presentation of what results were obtained about the agent’s performance by testing throughout the project, with a clear set of findings and insights. (8)		8
User Study	It is unclear what results were obtained from performing the pilot user study and how the agent was performing. (0)	The pilot user study results and analysis are reasonably well presented and it is clear what the main findings are. (4)	Results from the pilot user study about how the agent and its extensions performed are thoroughly analyzed and clearly presented in a structured way. (8)		8
Clarity and presentation	The writing style, structure and lay-out are messy and unclear. (0)	The writing style and presentation are up to standards, but parts are unclear. (4)	Very clear report in terms of writing style, structure and lay-out. (7)		7
Total points					34

Written Report
Criteria	Poor	Average	Good	Excellent	Max Points
Title and Formatting	Title and formatting are unclear or incomplete, missing required elements (0).	Title and formatting are present but lack clarity or professionalism (0.5).	Clear title and formatting; minor improvements needed (1).	Concise and professional title, all required elements are clear and complete (1.5).	1.5
Introduction	Introduction lacks clarity and fails to define the project, goals, or key terms (0).	Basic introduction provided but lacks depth or proper framing of goals and context (1).	Clear and informative introduction with a good explanation of goals and context (2).	Strong, engaging introduction with a clear definition of project goals, context, and significance (3).	3
Pipeline Explanation	Pipeline description is vague and lacks detail; user interaction is unclear (0).	Basic pipeline description provided but missing depth or key details (1.5).	Well-detailed pipeline with clear functionality and conversational flow explained (3).	Comprehensive, clear, and visually aided pipeline description that is easy to understand (4).	4
Intent and Slot Classifier	Explanation of intent and slot classifier is missing or vague; no performance analysis (0).	Basic explanation of intent and slot classifier with minimal performance data (1.5).	Detailed explanation with good performance analysis and discussion of challenges (3).	Comprehensive explanation, with strong performance analysis, detailed metrics, and innovative extensions (4).	4
Exclusion Mechanism	Exclusion mechanism is unclear, with no testing or pros and cons analysis (0).	Basic explanation provided, but lacks clarity in implementation and testing (1.5).	Clear explanation with testing and pros/cons analysis, but room for improvement (3).	Thorough explanation of implementation, testing, pros/cons, and strong performance data (4).	4
Extensions to the Bot	Extensions are unclear or not described; impact is not evident (0).	Extensions described but lack depth or clear motivation (1).	Well-documented extensions with clear motivation and impact analysis (1.5).	Comprehensive description of innovative extensions with clear benefits and motivations (2).	2
Pilot User Study	User study setup and results are missing or unclear (0).	Basic user study presented with limited results and insights (2).	Well-structured user study with good results and analysis of findings (3).	Detailed, well-analyzed user study with strong quantitative and qualitative insights (4).	4
Conclusion	Conclusion is missing or vague, with no reflection or future suggestions (0).	Basic summary provided but lacks depth or critical reflection (1).	Clear conclusion with reflection and practical improvement suggestions (1.5).	Strong, insightful conclusion with critical reflection and actionable improvement ideas (2).	2
Clarity and Presentation	Writing is unclear and poorly structured; formatting is messy (0).	Writing is somewhat clear but lacks polish and structure (1).	Clear and well-structured writing with minor presentation issues (2).	Very clear, professional writing with excellent structure and layout (3).	3

Extension and Exclusions
Criteria	Poor	Average	Good	Excellent	Max Points
Exclusion Implementation	Exclusion functionality is missing, incomplete, or unable to effectively be used at all. Little collaboration or effort to integrate exclusion approaches is evident. (0)	Exclusion functionality includes basic capabilities such as excluding a single type of slot but lacks refinement. Limited approach or minimal testing and analysis. (4)	Exclusion functionality works well for at least 2 slots. Multiple changes in the pipeline are made and combined effectively. Testing and trade-offs are adequately considered. (7)	Exclusion is comprehensive and well-integrated, handling multiple slots. Can be combined effectively with inclusion. Approaches are thoughtfully combined and tested rigorously, and limitations are clearly minimized. Collaboration across team members and sections is evident. (10)	10
Extensions to Agent Functionality (Recipe Filtering)	Little to no extensions to improve agent functionality were implemented. Extensions do not significantly enhance user experience or recipe filtering. (0)	Extensions improve functionality but are minimal, lack originality, or are not fully operational. They may replicate basic agent features with minor variations. (3)	Extensions improve key aspects such as NLU, filtering, and visual/aural communication. They are moderately complex and enhance user interaction meaningfully. (5)	Extensions are innovative, complex, and significantly enhance agent functionality, filtering, and user interaction. They demonstrate originality and are well-implemented. (7)	7
Conversational Competence and Navigation (Extension to Dialogue Patterns)	Few or no conversational patterns, repair strategies, or navigational features are added. Added features (e.g., restarting, stopping filters) are non-functional or unclear. (0)	Basic conversational patterns and navigational features (e.g., restarting or stopping filters) are added, but they lack depth, are inconsistently functional, or fail to address misunderstandings effectively. (3)	Conversational patterns cover relevant scenarios, and navigational features (e.g., restarting, stopping, removing filters) work reliably. Basic repair strategies and improved conversational flow are present. (5)	Conversational patterns and navigational features are well-integrated and intuitive. Patterns handle misunderstandings effectively, and navigation options (e.g., restarting, stopping, removing filters) are user-friendly, enhancing flexibility and user experience. (7)	7
Design and User Engagement (Extension to Visual Support)	Little or no effort to improve agent design in terms of visuals, utterances, or conversational style. (0)	Basic design improvements are present but lack originality, cohesion, or a clear target audience focus. Limited to either visuals or utterances. (2)	Agent design is engaging and consistent, with clear visual and conversational improvements tailored to a general audience. (5)	Design is highly engaging, cohesive, and tailored to a specific target audience. Visuals, utterances, and conversational style enhance usability and align with recipe selection goals. (6)	6