Assessment Rubric

Basic Agent - Inclusion
Criteria	Poor	Average	Good	Excellent	Max Pts
Intent and Slot Distribution Analysis	No analysis of dataset distribution or poorly implemented distribution function, making it impossible to understand dataset balance or coverage (0).	Distribution analysis is partially implemented, missing some intents or slots, leading to incomplete insights (1).	Distribution analysis is mostly complete but fails to provide insights into rare or underrepresented slots/intents (2).	A well-implemented distribution analysis identifies and interprets intent and slot frequencies, providing meaningful insights into dataset balance and potential issues (3).	3
Training Function Implementation	Training function is incomplete or incorrectly implemented, with critical issues such as missing loss functions, backward pass, or optimizer configuration (0).	Training function is partially implemented, with issues in loss calculations or optimizer setup, hindering effective training (1).	Training function is mostly implemented correctly, with minor errors in loss functions, gradient updates, or logging (2).	Training function is fully implemented, with well-defined loss functions, an effective optimizer, accurate loss combination, and proper gradient updates, ensuring the model learns effectively (3).	3
Robustness - Intent and Slot Classifier Evaluation Results	Fails to meet thresholds listed on the Evaluation Thresholds Page (0).	Meets 75% of the evaluation thresholds (2).	Meets all thresholds (5).	25% of the slots or intents exceed evaluation thresholds, demonstrating exceptional performance (7).	7
Conversation Patterns and Responses	Poor or lacking implementation of conversational patterns and agent responses, preventing the agent from functioning (0).	Not all conversation patterns and agent responses that were instructed were properly implemented, disabling certain functionalities (2).	Most conversational patterns and responses are implemented, but there are some minor issues in functionality or coverage (5).	All instructed conversational patterns and agent responses are properly implemented, ensuring smooth and natural interactions between the user and the agent (7).	7
Visuals	Poor or lacking implementation of visuals, preventing the agent from functioning (0).	Not all instructed visuals were properly implemented, making some of the pages unclear (1).	Most visuals are implemented correctly, but some may lack clarity or functionality (3).	All instructed pages are properly implemented, with clear information and a user-friendly design (5).	5
Recipe Filtering	Poor or lacking implementation of recipe filtering, preventing the agent from functioning (0).	Not all instructed filtering functions were properly implemented, disabling certain functionalities (1).	Most filtering functionalities are implemented, but there are occasional errors or missing edge cases (3).	All instructed recipe filtering functionalities are properly implemented, ensuring users can effectively narrow down recipes based on criteria (5).	5
Total Points					30

Written Report
Criteria	Poor	Average	Good	Excellent	Max Points
How does your agent work?	Description is vague and lacks detail on functionality, conversational flow, and examples (0).	Basic explanation provided but missing depth or clarity on key aspects such as flow or examples (1).	Clear explanation of functionality, conversational flow, and user examples, but missing exceptional detail (2).	Comprehensive, clear, explanation of functionality, conversational flow, and varied, testable examples illustrating agent capabilities (4).	4
Intent and Slot Classifier	Explanation of intent and slot classifier is missing or vague; no performance analysis or metrics provided (0).	Basic explanation of intent and slot classifier with minimal performance data and limited analysis of challenges (1).	Detailed explanation with good performance analysis (e.g., accuracy, precision, recall, F1 score) and discussion of challenges (2).	Comprehensive explanation, including strong performance analysis (metrics, tables, or confusion matrices), discussion of challenges, and innovative model improvements (4).	4
Exclusion Mechanism	Exclusion mechanism is unclear, with no testing or pros and cons analysis (0).	Basic explanation provided, but lacks clarity in implementation and testing (1.5).	Clear explanation with testing and pros/cons analysis, but room for improvement (3).	Thorough explanation of implementation, testing, pros/cons, and strong performance data (4).	4
Extensions to the Bot	Extensions are unclear or not described; impact is not evident (0).	Extensions described but lack depth or clear motivation (1).	Well-documented extensions with clear motivation and impact analysis (1.5).	Comprehensive description of innovative extensions with clear benefits and motivations (2).	4
Pilot User Study	User study setup and results are missing or unclear (0).	Basic user study presented with limited results and insights (2).	Well-structured user study with good results and analysis of findings (3).	Detailed, well-analyzed user study with strong quantitative and qualitative insights (4).	4
Conclusion	Conclusion is missing or vague, with no reflection or future suggestions (0).	Basic summary provided but lacks depth or critical reflection (1).	Clear conclusion with reflection and practical improvement suggestions (1.5).	Strong, insightful conclusion with critical reflection and actionable improvement ideas (2).	4
Clarity and Presentation	Writing is unclear and poorly structured; formatting is messy (0).	Writing is somewhat clear but lacks polish and structure (1).	Clear and well-structured writing with minor presentation issues (2).	Very clear, professional writing with excellent structure and layout (3).	6
Total Points					30

Extension and Exclusions (see Extensions for inspiration)
Criteria	Poor	Average	Good	Excellent	Max Points
Exclusion Implementation	Exclusion functionality is missing, incomplete, or unable to effectively be used at all. Little collaboration or effort to integrate exclusion approaches is evident. (0)	Exclusion functionality includes basic capabilities such as excluding a single type of slot but lacks refinement. Limited approach or minimal testing and analysis. (3)	Exclusion functionality works well for at least 2 slots. Multiple changes in the pipeline are made and combined effectively. Testing and trade-offs are adequately considered. (5)	Exclusion is comprehensive and well-integrated, handling multiple slots. Can be combined effectively with inclusion. Approaches are thoughtfully combined and tested rigorously, and limitations are clearly minimized. Collaboration across team members and sections is evident. (8)	8
Extensions to Agent Functionality (Recipe Filtering)	Little to no extensions to improve agent functionality were implemented. Extensions do not significantly enhance user experience or recipe filtering. (0)	Extensions improve functionality but are minimal, lack originality, or are not fully operational. They may replicate basic agent features with minor variations. (3)	Extensions improve key aspects such as NLU, filtering, and visual/aural communication. They are moderately complex and enhance user interaction meaningfully. (5)	Extensions are innovative, complex, and significantly enhance agent functionality, filtering, and user interaction. They demonstrate originality and are well-implemented. (8)	8
Conversational Competence and Navigation (Extension to Dialogue Patterns)	Few or no conversational patterns, repair strategies, or navigational features are added. Added features (e.g., restarting, stopping filters) are non-functional or unclear. (0)	Basic conversational patterns and navigational features (e.g., restarting or stopping filters) are added, but they lack depth, are inconsistently functional, or fail to address misunderstandings effectively. (3)	Conversational patterns cover relevant scenarios, and navigational features (e.g., restarting, stopping, removing filters) work reliably. Basic repair strategies and improved conversational flow are present. (5)	Conversational patterns and navigational features are well-integrated and intuitive. Patterns handle misunderstandings effectively, and navigation options (e.g., restarting, stopping, removing filters) are user-friendly, enhancing flexibility and user experience. (8)	8
Design and User Engagement (Extension to Visual Support)	Little or no effort to improve agent design in terms of visuals, utterances, or conversational style. (0)	Basic design improvements are present but lack originality, cohesion, or a clear target audience focus. Limited to either visuals or utterances. (2)	Agent design is engaging and consistent, with clear visual and conversational improvements tailored to a general audience. (4)	Design is highly engaging, cohesive, and tailored to a specific target audience. Visuals, utterances, and conversational style enhance usability and align with recipe selection goals. (6)	6
Total Points					30