Basic Agent - Inclusion | |||||
---|---|---|---|---|---|
Criteria | Poor | Average | Good | Excellent | Max Pts |
Intent and Slot Distribution Analysis | No analysis of dataset distribution or poorly implemented distribution function, making it impossible to understand dataset balance or coverage (0). | Distribution analysis is partially implemented, missing some intents or slots, leading to incomplete insights (1). | Distribution analysis is mostly complete but fails to provide insights into rare or underrepresented slots/intents (2). | A well-implemented distribution analysis identifies and interprets intent and slot frequencies, providing meaningful insights into dataset balance and potential issues (3). | 3 |
Training Function Implementation | Training function is incomplete or incorrectly implemented, with critical issues such as missing loss functions, backward pass, or optimizer configuration (0). | Training function is partially implemented, with issues in loss calculations or optimizer setup, hindering effective training (1). | Training function is mostly implemented correctly, with minor errors in loss functions, gradient updates, or logging (2). | Training function is fully implemented, with well-defined loss functions, an effective optimizer, accurate loss combination, and proper gradient updates, ensuring the model learns effectively (3). | 3 |
Robustness - Intent and Slot Classifier Evaluation Results | Fails to meet thresholds listed on the Evaluation Thresholds Page (0). | Meets 75% of the evaluation thresholds (2). | Meets all thresholds (5). | 25% of the slots or intents exceed evaluation thresholds, demonstrating exceptional performance (7). | 7 |
Conversation Patterns and Responses | Poor or lacking implementation of conversational patterns and agent responses, preventing the agent from functioning (0). | Not all conversation patterns and agent responses that were instructed were properly implemented, disabling certain functionalities (2). | Most conversational patterns and responses are implemented, but there are some minor issues in functionality or coverage (5). | All instructed conversational patterns and agent responses are properly implemented, ensuring smooth and natural interactions between the user and the agent (7). | 7 |
Visuals | Poor or lacking implementation of visuals, preventing the agent from functioning (0). | Not all instructed visuals were properly implemented, making some of the pages unclear (1). | Most visuals are implemented correctly, but some may lack clarity or functionality (3). | All instructed pages are properly implemented, with clear information and a user-friendly design (5). | 5 |
Recipe Filtering | Poor or lacking implementation of recipe filtering, preventing the agent from functioning (0). | Not all instructed filtering functions were properly implemented, disabling certain functionalities (1). | Most filtering functionalities are implemented, but there are occasional errors or missing edge cases (3). | All instructed recipe filtering functionalities are properly implemented, ensuring users can effectively narrow down recipes based on criteria (5). | 5 |
Total Points | 30 |
Written Report | |||||
---|---|---|---|---|---|
Criteria | Poor | Average | Good | Excellent | Max Points |
How does your agent work? | Description is vague and lacks detail on functionality, conversational flow, and examples (0). | Basic explanation provided but missing depth or clarity on key aspects such as flow or examples (1). | Clear explanation of functionality, conversational flow, and user examples, but missing exceptional detail (2). | Comprehensive, clear, explanation of functionality, conversational flow, and varied, testable examples illustrating agent capabilities (4). | 4 |
Intent and Slot Classifier | Explanation of intent and slot classifier is missing or vague; no performance analysis or metrics provided (0). | Basic explanation of intent and slot classifier with minimal performance data and limited analysis of challenges (1). | Detailed explanation with good performance analysis (e.g., accuracy, precision, recall, F1 score) and discussion of challenges (2). | Comprehensive explanation, including strong performance analysis (metrics, tables, or confusion matrices), discussion of challenges, and innovative model improvements (4). | 4 |
Exclusion Mechanism | Exclusion mechanism is unclear, with no testing or pros and cons analysis (0). | Basic explanation provided, but lacks clarity in implementation and testing (1.5). | Clear explanation with testing and pros/cons analysis, but room for improvement (3). | Thorough explanation of implementation, testing, pros/cons, and strong performance data (4). | 4 |
Extensions to the Bot | Extensions are unclear or not described; impact is not evident (0). | Extensions described but lack depth or clear motivation (1). | Well-documented extensions with clear motivation and impact analysis (1.5). | Comprehensive description of innovative extensions with clear benefits and motivations (2). | 4 |
Pilot User Study | User study setup and results are missing or unclear (0). | Basic user study presented with limited results and insights (2). | Well-structured user study with good results and analysis of findings (3). | Detailed, well-analyzed user study with strong quantitative and qualitative insights (4). | 4 |
Conclusion | Conclusion is missing or vague, with no reflection or future suggestions (0). | Basic summary provided but lacks depth or critical reflection (1). | Clear conclusion with reflection and practical improvement suggestions (1.5). | Strong, insightful conclusion with critical reflection and actionable improvement ideas (2). | 4 |
Clarity and Presentation | Writing is unclear and poorly structured; formatting is messy (0). | Writing is somewhat clear but lacks polish and structure (1). | Clear and well-structured writing with minor presentation issues (2). | Very clear, professional writing with excellent structure and layout (3). | 6 |
Total Points | 30 |
Extension and Exclusions (see Extensions for inspiration) | |||||
---|---|---|---|---|---|
Criteria | Poor | Average | Good | Excellent | Max Points |
Exclusion Implementation | Exclusion functionality is missing, incomplete, or unable to effectively be used at all. Little collaboration or effort to integrate exclusion approaches is evident. (0) | Exclusion functionality includes basic capabilities such as excluding a single type of slot but lacks refinement. Limited approach or minimal testing and analysis. (3) | Exclusion functionality works well for at least 2 slots. Multiple changes in the pipeline are made and combined effectively. Testing and trade-offs are adequately considered. (5) | Exclusion is comprehensive and well-integrated, handling multiple slots. Can be combined effectively with inclusion. Approaches are thoughtfully combined and tested rigorously, and limitations are clearly minimized. Collaboration across team members and sections is evident. (8) | 8 |
Extensions to Agent Functionality (Recipe Filtering) | Little to no extensions to improve agent functionality were implemented. Extensions do not significantly enhance user experience or recipe filtering. (0) | Extensions improve functionality but are minimal, lack originality, or are not fully operational. They may replicate basic agent features with minor variations. (3) | Extensions improve key aspects such as NLU, filtering, and visual/aural communication. They are moderately complex and enhance user interaction meaningfully. (5) | Extensions are innovative, complex, and significantly enhance agent functionality, filtering, and user interaction. They demonstrate originality and are well-implemented. (8) | 8 |
Conversational Competence and Navigation (Extension to Dialogue Patterns) | Few or no conversational patterns, repair strategies, or navigational features are added. Added features (e.g., restarting, stopping filters) are non-functional or unclear. (0) | Basic conversational patterns and navigational features (e.g., restarting or stopping filters) are added, but they lack depth, are inconsistently functional, or fail to address misunderstandings effectively. (3) | Conversational patterns cover relevant scenarios, and navigational features (e.g., restarting, stopping, removing filters) work reliably. Basic repair strategies and improved conversational flow are present. (5) | Conversational patterns and navigational features are well-integrated and intuitive. Patterns handle misunderstandings effectively, and navigation options (e.g., restarting, stopping, removing filters) are user-friendly, enhancing flexibility and user experience. (8) | 8 |
Design and User Engagement (Extension to Visual Support) | Little or no effort to improve agent design in terms of visuals, utterances, or conversational style. (0) | Basic design improvements are present but lack originality, cohesion, or a clear target audience focus. Limited to either visuals or utterances. (2) | Agent design is engaging and consistent, with clear visual and conversational improvements tailored to a general audience. (4) | Design is highly engaging, cohesive, and tailored to a specific target audience. Visuals, utterances, and conversational style enhance usability and align with recipe selection goals. (6) | 6 |
Total Points | 30 |