[TBC]Preliminaries and Quiz Materials

Welcome to this page with an overview of some of the required background information for this project. You most likely have already seen some of this information in other courses, but some of it will also be new to you. We expect each of you to have a decent understanding of the material listed below, as that is necessary to be able to complete this project successfully. You need to complete a quiz, where the material presented below can be used to prepare.

Page Overview

Git and GitHub

We use GitHub Classroom to provide you with the initial agent code. GitHub is a code hosting platform for version control and collaboration. You need to join the GitHub classroom and use it for developing and sharing your code, and for storing and updating all the deliverables in this project. In order to help you understand how to do that, we will introduce you to some basic readings and a tutorial to help you gain knowledge of how to use git. Git is a common tool used by many coding teams worldwide to develop code in tandem and facilitate its alignment. Getting to know git as part of this course will surely be of benefit to you in the long term.

Git

Git is a tool used by developers to manage and track changes in their code or files. Think of it like a magical filing cabinet that remembers every version of a document or project you’ve ever worked on.

Imagine you’re writing a book. You make changes, but then decide you liked the way it was two days ago. Git lets you go back and see what it looked like back then.
It tracks what changes were made, who made them, and when.
It’s designed to help teams work together without accidentally overwriting each other’s work. It merges everyone’s contributions intelligently.
While Git works locally on your computer, it also pairs with tools like GitHub to store a backup of your work online.

Key Features:

Version Control: Keeps track of all the changes to a file or project.
Branching: You can create “branches” to work on different features or ideas without messing up the main version.
Undo Mistakes: If something breaks, you can roll back to a previous version.

GitHub

GitHub is like an online home for Git projects. If Git is your magical filing cabinet, GitHub is the cloud storage where you can share that cabinet with others.

It’s a website where people can store, share, and back up their Git projects online.
It ensures your work is safe even if something happens to your local files (like your computer crashing).
It makes it easy for teams to work together because everyone can see the latest version of the project and contribute their own changes.
GitHub also adds tools for collaboration like:
- Issues: A way to track bugs or tasks.
- Pull Requests: When someone suggests a change, the team can review it and decide whether to include it.
- Actions: Automate tasks like testing your code every time there’s a change.

Git Commands

You can do just the basics reading, just the interactive tutorial, or both. There is also a more in-depth explanation of each command on the third page.

The absolute basics (reading): https://www.simplilearn.com/tutorials/git-tutorial/git-commands.

The basics (an interactive tutorial!) - https://learngitbranching.js.org/.

If you want to know more (not required): Everything on Git.

Git Merging and Conflicts

https://www.simplilearn.com/tutorials/git-tutorial/merge-conflicts-in-git

Git Best Practices

One of the most important takeaways from the link below is that:

Commits are Supposed to Be Small and Frequent

Whenever you have made a single logical alteration, you can commit code. Frequent commit helps you to write brief commit messages that are short yet informative. Also, it will provide significant meaning for those who may be reading through your code.

https://www.geeksforgeeks.org/best-git-practices-to-follow-in-teams/

In the case of this project commits must be made with certain specifications and things in mind please read Commits for more information.

Training and Testing Machine Learning Models

Machine Learning (ML) is a crucial step toward achieving artificial intelligence (AI). It involves creating programs that analyze data and learn to predict outcomes. In ML, models are developed to forecast specific events, such as predicting a user’s intent based on their input.

To evaluate the effectiveness of a model, a method called Train/Test is commonly used. This approach involves splitting the dataset into two parts: a training set and a testing set.

Training the model means building it by learning patterns and parameters from the training dataset.
Testing the model involves assessing its performance (e.g., accuracy) using the test dataset.

This process helps determine if the model is reliable enough for real-world application.

https://www.w3schools.com/python/python_ml_train_test.asp.

A General Pipeline of Task-Oriented Spoken Dialogue Systems

Spoken dialogue systems have been the most prominent component in today’s virtual personal assistants, such as Microsoft’s Cortana, Apple’s Siri, Amazon Alexa, Google Assistant, and Facebook’s M.
A classical pipeline of a task-oriented spoken dialogue system includes key components:

Automatic Speech Recognition (ASR) - Converts spoken language into textual transcript.
Natural Language Understanding (NLU) - Interprets and extracts meaning from the transcript.
Dialogue Management (DM) - Manages the flow of conversation and determines the system’s response.
Natural Language Generation (NLG) - Constructs responses in natural language.
Text to Speech (TTS) - Converts the generated text into spoken output.

In this project, we will focus on building a simple pipeline that integrates ASR followed by a NLU component. We will use an existing ASR model (e.g., Whisper) for inference/prediction only (no training), while enhancing the performance of the NLU model (e.g., BERT) by training it on conversational data collected from the previous course.

By the end of the project, you will learn how to:

Construct a basic dialogue pipeline.
Train and improve individual components, specifically the NLU model.

This hands-on approach will provide insight into developing and refining key elements of a dialogue system.

ASR with WHISPER

ASR component converts spoken language into text. It enables machines to interpret and transcribe human speech, allowing for seamless interaction between users and applications through voice commands.

Whisper is a commonly used general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-tasking model that can perform multi-lingual speech recognition, speech translation, and language identification.

NLU with BERT

Unlike the open-domain dialogue (e.g., chitchat), task-oriented dialogue’s pattern is restricted by a dialogue ontology, which defines all possible intents, slots, and their corresponding candidate values in specific domains. The NLU component maps a user’s utterance to a structured semantic representation, which includes the intent behind the utterance and a set of key-value pairs known as slots and values. This mapping enables dialogue systems to understand user needs and respond appropriately. For example, given the transcribed utterance “Recommend a restaurant at China Town“, the NLU model can identify: the intent as “inform“ and the value of the slot “destination“as “China Town“.

NLU task → Intent and Slot Classification

The NLU task can be approached as joint learning of intent classification (IC) and slot filling (SF), with the slot labels typically formatted in the widely-used BIO format, as shown below. In general, joint learning of intent and slot classification models are mutually beneficial. https://arxiv.org/abs/2011.00564

Utterance	Recommend	a	restaurant	at	China	Town
Slot	O	O	O	O	B-destination	I-destination
Intent	Inform

Example of SF and IC output for an utterance. Slot labels are in BIO format: B indicates the start of a slot span, I the inside of a span while O denotes that the word does not belong to any slot.

The NLU architecture include the following key parts:

Base Model: Pre-trained BertModel (e.g.,bert-base-uncased) for generating contextual embeddings. It includes two main parts: the encoder and the attention mechanism. The encoder processes the input sequence and creates contextual embeddings for each token, while the attention mechanism helps capture dependencies between words, regardless of their position in the sequence.
Intent Classifier: A linear layer on top of the [CLS] token output for intent prediction. The final output of this layer is typically a softmax function, which predicts the probability distribution over a predefined set of possible intents.
Slot Classifier: A linear layer applied to the token-level embeddings for slot tagging. It assigns a label to each token, indicating whether it represents a particular entity (e.g., a destination, date, etc). This process is often referred to as token tagging. The output of this linear layer is typically a softmax layer that predicts slot labels for each token.
Joint Learning of the Two Classifiers: During training, the model minimizes a combined loss function, which includes separate losses for intent classification and slot filling. This ensures that the model not only accurately predicts the intent but also extracts the correct slots for each token in the sentence.

Pre-training and fine-tuning BERT

BERT, Bidirectional Encoder Representations from Transformers, is a widely used transformer-based language model designed for various natural language processing tasks, including classification. It consists of two types of training procedures:
During pre-training, BERT is trained on a large corpus of English text in a self-supervised manner. This means it is trained on large-scale, raw, unlabeled text without human annotations, using an automatic process to generate input-output pairs from the text.
During fine-tuning, BERT is first initialized with its pre-trained parameters, and then all parameters are fine-tuned using labeled data from downstream tasks, allowing it to adapt to specific applications.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

LLMs on Hugging Face

Hugging Face is an AI community and platform that offers an easy-to-use interface for accessing and utilizing pretrained large language models (LLMs) like BERT released by various organizations and researchers. Here is a simple example of how to use this model to get the features of a given text in PyTorch:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

https://huggingface.co/google-bert/bert-base-uncased

HTML and Bootstrap

You will be developing a few basic web pages to provide some visual support to a user while they is conversing with your agent. We assume you are familiar with basic forms of HTML, the markup language for developing a webpage. If not, please check out this https://www.w3schools.com/html/default.asp to get you started.

On top of HTML, we use Bootstrap 4 to facilitate the development of a webpage. The main purpose of this visual support is twofold: to provide (1) support to a user to reduce their cognitive load (the amount of information working memory needs to process at any given time) and (2) an indication of the progress on the task that has been made thus far. A user may not be able to remember all preferences and/or constraints on recipes they selected thus far. A system that would require a user to do so, would likely not be experienced as very user-friendly. It is also nice to show a user how much progress has been made in finding a recipe they like. A simple measure for our recipe recommendation agent to indicate progress is to show how many recipes still match the preferences of the user.

Bootstrap is a powerful, open-source front-end framework for web development. Many of the Bootstrap components can be used by a MARBEL agent to create and display webpages in a Chrome browser using the Social Interaction Cloud infrastructure. We first list a few of Bootstrap’s key features:

Responsive Design: Bootstrap's grid system and pre-designed components enable easy creation of responsive websites.
HTML, CSS, and JS Components: Offers a wide range of reusable components like buttons, and navigation bars.
Customization: Allows for extensive customization.
Community and Documentation: Backed by a strong community and comprehensive documentation.
Mobile-First Approach: Prioritizes mobile devices in design strategies.

This framework simplifies web development, making it accessible for beginners while still providing a powerful tool for more experienced developers.

To gain an understanding of Bootstrap, this https://www.w3schools.com/bootstrap4/default.asp will be very useful. To familiarize yourself with some of the basic components of Bootstrap, take a look at the first few items in the tutorial. We recommend you read at least up to the item on https://www.w3schools.com/bootstrap4/bootstrap_buttons.asp. The Tutorial will be useful for later reference to look up how you can change colors and use, for example, a progress bar.

To understand how you can integrate Bootstrap components into your agent, you need to read our https://socialrobotics.atlassian.net/wiki/pages/createpage.action?spaceKey=PM2&title=2025%20Visual%20Support%20Guide. This is where we explain in more detail how you can use Bootstrap components for creating a webpage in this project.

Prolog

You will develop your recipe recommendation agent using MARBEL and SWI Prolog. The MARBEL agent implements a dialog management engine that you will use. You do not need to change this agent. You are, however, allowed to modify it if you like. The focus will be mostly on using Prolog to provide the agent with the knowledge it needs and to make it smarter by providing it with some logic related to its recipe recommendation task.

Prolog is a rule-based programming language based on symbolic logic. It is commonly used in Artificial Intelligence and Computational Linguistics. To understand Prolog, you should have familiarized yourself with its key concepts and structures using the book https://www.let.rug.nl/bos/lpn//lpnpage.php?pageid=online. This book covers fundamental topics like facts, rules, queries, unification, proof search, recursion, lists, arithmetic, definite clause grammars, and more. It also delves into more advanced topics such as cuts and negation. We briefly summarize here some of the core concepts for your convenience.

Logic-Based Programming: Prolog is fundamentally different from procedural languages like C or Python. It is based on formal logic, making it well-suited for tasks that involve rules and constraints, such as solving puzzles or processing natural language.
Facts, Rules, and Recursion: The core of Prolog programming involves defining facts and rules. Facts are basic statements about objects and/or their relationships. Rules define relationships between facts using basic logical relations such as conjunction, disjunction, and negation. The fact that rules can be recursive is what gives Prolog its power as a programming language. Recursion can be used, for example, for iterating over frequently used data structures in Prolog such as lists.
Lists and Arithmetic: Lists are fundamental data structures in Prolog. Prolog offers a range of built-in predicates for list manipulation. It also provides built-in support for arithmetic operations. Because Prolog’s basic form of computation is based on term matching, which does not support efficiently doing math, care must be taken to use the right operators when handling numbers in Prolog.
Pattern Matching and Unification: The Prolog core form of computation consists of pattern matching with the aim of unifying Prolog terms. Unification of two terms is a fundamental operation in Prolog, which, if it succeeds, returns substitutions for Prolog variables. When these substitutions are applied to the terms (and the variables instantiated), the result would be two identical terms.
Backtracking: Prolog uses backtracking to evaluate the rules in a program to find solutions to problems. If one trace (part of a search tree) fails, Prolog automatically backtracks to find and try alternative options that have not yet been explored to continue searching for a solution.
Advanced Features: Prolog provides advanced features like the cut operator. This operator can be used for controlling the backtracking process, mainly to increase the efficiency of Prolog programs.
Definite Clause Grammars (DCGs): These are used in Prolog for parsing and generating natural language constructs, making them a powerful tool for language-related applications.
Applications: Prolog is widely used in AI for tasks such as expert systems, natural language processing, and theorem proving, owing to its ability to handle complex symbolic information and logical constructs efficiently.