Welcome to this page with an overview of some of the required background information for this project. You most likely have already seen some of this information in other courses, but some of it will also be new to you. We expect each of you to have a decent understanding of the material listed below, as that is necessary to be able to complete this project successfully. You need to complete a quiz, where the material presented below can be used to prepare.

Page Overview

Git and GitHub

We use GitHub Classroom to provide you with the initial agent code. GitHub is a code hosting platform for version control and collaboration. You need to join the GitHub classroom and use it for developing and sharing your code, and for storing and updating all the deliverables in this project. In order to help you understand how to do that, we will introduce you to some basic readings and a tutorial to help you gain knowledge of how to use git. Git is a common tool used by many coding teams worldwide to develop code in tandem and facilitate its alignment. Getting to know git as part of this course will surely be of benefit to you in the long term.

Git

Git is a tool used by developers to manage and track changes in their code or files. Think of it like a magical filing cabinet that remembers every version of a document or project you’ve ever worked on.

Key Features:

GitHub

GitHub is like an online home for Git projects. If Git is your magical filing cabinet, GitHub is the cloud storage where you can share that cabinet with others.

Git Commands

You can do just the basics reading, just the interactive tutorial, or both. There is also a more in-depth explanation of each command on the third page.

The absolute basics (reading): https://www.simplilearn.com/tutorials/git-tutorial/git-commands.

The basics (an interactive tutorial!) -  https://learngitbranching.js.org/.

If you want to know more (not required): Everything on Git.

Git Merging and Conflicts

https://www.simplilearn.com/tutorials/git-tutorial/merge-conflicts-in-git

Git Best Practices

One of the most important takeaways from the link below is that:

Commits are Supposed to Be Small and Frequent

Whenever you have made a single logical alteration, you can commit code. Frequent commit helps you to write brief commit messages that are short yet informative. Also, it will provide significant meaning for those who may be reading through your code.

https://www.geeksforgeeks.org/best-git-practices-to-follow-in-teams/


In the case of this project commits must be made with certain specifications and things in mind please read Commits for more information.


Machine Learning Basics Relevant to This Course

Note: We do not expect you to master all aspects of machine learning. Instead, focus on the following fundamental concepts that are directly related to this course.

What is Machine Learning?

Machine learning (ML) is a type of Artificial Intelligence (AI) that allows computers to learn and make decisions without being explicitly programmed. It involves feeding data into algorithms that can then identify patterns and make predictions on new data. In short, machines observe a pattern from data and attempt to imitate it in some way that can be either direct or indirect.

image-20250102-154127.png

Machine learning can be categorized into three main types: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is the most commonly used type of machine learning, and it is also the focus of this course. In supervised learning, the model is trained on labeled data, meaning each input (feature) has a corresponding output (label). The objective is for the model to learn the relationship between the input variables and their associated labels. Once trained, the model can make accurate predictions or inferences on new, unseen data by applying the patterns it has learned from the labeled dataset.

Here are some examples of ML problems (input in bold and output in italic):

Why do we need Machine Learning?

Machine learning is able to learn, train from data, and solve/predict complex solutions which cannot be done with traditional programming. It enables us with better decision making ands solve complex business problems in optimized time. Recent advancements in AI have been propelled by machine learning, particularly its subset, deep learning. Additionally, compared to black-box agents like Dialogflow, developing our own machine-learning models provides greater control, enabling continuous improvement over time.

How do we train and evaluate a model?

When developing a machine learning model, one of the fundamental steps is to split the data into different subsets: training, testing, and validation datasets.

Therefore, there are several stages during the development of a model:

What is Classification in Machine Learning?

Classification is a supervised machine learning method where the model aims to predict the correct label or category for a given input. In classification, the model is trained using labeled training data, learns to identify patterns, and is then evaluated on test data to assess its performance. Once trained and evaluated, the model can be used to make predictions on new, unseen data.

For example, a classification algorithm (classifier) might learn to predict whether a given email is spam or not, as illustrated below. This is a binary classification task, where the goal is to categorize the input data into two mutually exclusive classes. In this case, the training data is labeled with binary categories, such as "spam" and "not spam," "true" and "false," or "positive" and "negative." These labels guide the model in learning the differences between the two categories, allowing it to make accurate predictions when exposed to new data.

image-20250102-204716.png

What is a deep neural network and how does it work?

A deep neural network (DNN) is a machine learning model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options, and arrive at conclusions.

Every neural network consists of layers of nodes, or artificial neurons—an input layer, one or more hidden layers, and an output layer. Each node connects to others, and has its own associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

image-20250106-140740.png

There are several key concerns during the development of a DNN model:

Defining a model architecture:

Let's define our first deep neural network (DNN) with a single-layer, fully connected neural network, and a 3-dimensional input. In a fully connected layer, each input is connected to every output, ensuring comprehensive interaction between neurons.

image-20250106-141514.png

Defining a loss function: Loss functions are quantitative measures of how satisfactory the model predictions are (i.e., how “good” the model parameters are). We will use the cross entropy (CE) loss which is standard and common for classification.

Optimizing the loss function: During training, the goal is to find the “best” values of the model parameters (weights and bias) that minimize the loss function based on the training dataset. This process is known as optimization.


A General Pipeline of Task-Oriented Spoken Dialogue Systems

Spoken dialogue systems (SDSs) have been the most prominent component in today’s virtual personal assistants, such as Microsoft’s Cortana, Apple’s Siri, Amazon Alexa, Google Assistant, and Facebook’s M. Unlike chitchat, task-oriented SDSs aim to assist users with a specific goal, for example, recommend a recipe or booking a hotel.
A classical pipeline architecture of a task-oriented spoken dialogue system includes key components:

image-20250102-122418.png

In this project, we will focus on building a simple pipeline that integrates ASR followed by a NLU component. We will use an existing ASR model (e.g., Whisper) for inference/prediction only (no training), while enhancing the performance of the NLU model (e.g., BERT) by training it on conversational data collected from the previous course.

By the end of the project, you will learn how to:

This hands-on approach will provide insight into developing and refining key elements of a dialogue system.

ASR with WHISPER

Automatic Speech Recognition (ASR) is a key component in the pipeline architecture, which converts spoken language into text. It enables machines to interpret and transcribe human speech, allowing for seamless interaction between users and applications through voice commands.

Whisper is a commonly used general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-tasking model that can perform multi-lingual speech recognition, speech translation, and language identification.

NLU with BERT

Unlike chitchat, task-oriented dialogue’s pattern is restricted by a dialogue ontology, which defines all possible intents, slots, and their corresponding candidate values in specific domains. The NLU component maps a user’s utterance to a structured semantic representation, which includes the intent behind the utterance and a set of key-value pairs known as slots and values. This mapping enables dialogue systems to understand user needs and respond appropriately. For example, given the transcribed utterance “I want to cook Italian pizza“, the NLU model can identify: the intent as “addFilter“ and the value of the slot “ingredienttype“as “italian pizza“.

NLU task → Intent and Slot Classification

The NLU task can be approached as joint learning of intent classification (IC) and slot filling (SF), with the slot labels typically formatted in the widely-used BIO format, as shown below. In general, joint learning of intent and slot classification models are mutually beneficial. Here is an example of SF and IC output for an utterance. Slot labels are in BIO format: B indicates the start of a slot span, I the inside of a span while O denotes that the word does not belong to any slot.

Utterance

I

want

to

cook

Italian

pizza

Slot

O

O

O

O

B-ingredienttype

I-ingredienttype

Intent

addFilter

The NLU architecture includes the following key parts:

Pre-training and fine-tuning BERT

image-20241227-214406.png

You can do just the basic reading of the above. There are also more in-depth explanations on the following third page.


LLMs on Hugging Face

Hugging Face is an AI community and platform that offers an easy-to-use interface for accessing and utilizing pretrained large language models (LLMs) like BERT released by various organizations and researchers. Here is a simple example of how to use this model to get the features of a given text in PyTorch:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

HTML, Javascript, and Bootstrap

You will be developing a few basic web pages to provide some visual support to a user while they is conversing with your agent. We assume you are familiar with basic forms of HTML, the markup language for developing a webpage. If not, please check out this https://www.w3schools.com/html/default.asp to get you started.

On top of HTML, we use Bootstrap 4 to facilitate the development of a webpage. The main purpose of this visual support is twofold: to provide (1) support to a user to reduce their cognitive load (the amount of information working memory needs to process at any given time) and (2) an indication of the progress on the task that has been made thus far. A user may not be able to remember all preferences and/or constraints on recipes they selected thus far. A system that would require a user to do so, would likely not be experienced as very user-friendly. It is also nice to show a user how much progress has been made in finding a recipe they like. A simple measure for our recipe recommendation agent to indicate progress is to show how many recipes still match the preferences of the user.

Bootstrap is a powerful, open-source front-end framework for web development. Many of the Bootstrap components can be used by a MARBEL agent to create and display webpages in a Chrome browser using the Social Interaction Cloud infrastructure. We first list a few of Bootstrap’s key features:

  1. Responsive Design: Bootstrap's grid system and pre-designed components enable easy creation of responsive websites.

  2. HTML, CSS, and JS Components: Offers a wide range of reusable components like buttons, and navigation bars.

  3. Customization: Allows for extensive customization.

  4. Community and Documentation: Backed by a strong community and comprehensive documentation.

  5. Mobile-First Approach: Prioritizes mobile devices in design strategies.

This framework simplifies web development, making it accessible for beginners while still providing a powerful tool for more experienced developers.

To gain an understanding of Bootstrap, this https://www.w3schools.com/bootstrap4/default.asp will be very useful. To familiarize yourself with some of the basic components of Bootstrap, take a look at the first few items in the tutorial. We recommend you read at least up to the item on https://www.w3schools.com/bootstrap4/bootstrap_buttons.asp. The Tutorial will be useful for later reference to look up how you can change colors and use, for example, a progress bar.


Prolog

You will develop your recipe recommendation agent using MARBEL and SWI Prolog. The MARBEL agent implements a dialog management engine that you will use. You do not need to change this agent. You are, however, allowed to modify it if you like. The focus will be mostly on using Prolog to provide the agent with the knowledge it needs and to make it smarter by providing it with some logic related to its recipe recommendation task.

Prolog is a rule-based programming language based on symbolic logic. It is commonly used in Artificial Intelligence and Computational Linguistics. To understand Prolog, you should have familiarized yourself with its key concepts and structures using the book https://www.let.rug.nl/bos/lpn//lpnpage.php?pageid=online. This book covers fundamental topics like facts, rules, queries, unification, proof search, recursion, lists, arithmetic, definite clause grammars, and more. It also delves into more advanced topics such as cuts and negation. We briefly summarize here some of the core concepts for your convenience.