Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Training and Testing Machine Learning Models

Machine Learning (ML) is a crucial step toward achieving artificial intelligence (AI). It involves creating programs that analyze data and learn to predict outcomes. In ML, models are developed to forecast specific events, such as predicting a user’s intent based on their input.

To evaluate the effectiveness of a model, a method called Train/Test is commonly used. This approach involves splitting the dataset into two parts: a training set and a testing set.

  • Training the model means building it by learning patterns and parameters from the training dataset.

  • Testing the model involves assessing its performance (e.g., accuracy) using the test dataset.

This process helps determine if the model is reliable enough for real-world application.

https://www.w3schools.com/python/python_ml_train_test.asp.

A General Pipeline of Task-Oriented Spoken Dialogue Systems

Spoken dialogue systems have been the most prominent component in today’s virtual personal assistants, such as Microsoft’s Cortana, Apple’s Siri, Amazon Alexa, Google Assistant, and Facebook’s M.
A classical pipeline of a task-oriented spoken dialogue system includes key components:

  • Automatic Speech Recognition (ASR) - Converts spoken language into textual transcript.

  • Natural Language Understanding (NLU) - Interprets and extracts meaning from the transcript.

  • Dialogue Management (DM) - Manages the flow of conversation and

...

  •  what is nlu
  •  what is an nlu pipeline

ASR and WHISPER

  • determines the system’s response.

  • Natural Language Generation (NLG) - Constructs responses in natural language.

  • Text to Speech (TTS) - Converts the generated text into spoken output.

...

https://link.springer.com/article/10.1007/s10462-022-10248-8

In this project, we will focus on building a simple pipeline that integrates ASR followed by a NLU component. We will use an existing ASR model (e.g., Whisper) for inference/prediction only (no training), while enhancing the performance of the NLU model (e.g., BERT) by training it on conversational data collected from the previous course.

By the end of the project, you will learn how to:

  • Construct a basic dialogue pipeline.

  • Train and improve individual components, specifically the NLU model.

This hands-on approach will provide insight into developing and refining key elements of a dialogue system.

ASR and WHISPER

ASR component converts spoken language into text. It enables machines to interpret and transcribe human speech, allowing for seamless interaction between users and applications through voice commands.

Whisper is a commonly used general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-tasking model that can perform multi-lingual speech recognition, speech translation, and language identification.

NLU and BERT

The NLU component maps a user’s utterance to a structured semantic representation, which includes the intent behind the utterance and a set of key-value pairs known as slots and values. For example, given the transcribed utterance “Recommend a restaurant at China Town“, the NLU model can identify: the intent as “inform“ and the value of the slot “destination“as “China Town“. This mapping enables dialogue systems to understand user needs and respond appropriately. Unlike the open-domain dialogue (e.g., chitchat), task-oriented dialogue is restricted by a dialogue ontology, which defines all possible intents, slots, and their corresponding candidate values.

NLU task → Intent and Slot Classification

  •  Explain what the task is generally
  •  Explain how it is done
  •  What is an ontology - importance
  •  intent and slot definition

BERT,

  •  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

LLMs and Hugging Face

  •  Explain Pretrained LLMs like BERT used

...

On top of HTML, we use Bootstrap 4 for facilitating to facilitate the development of a webpage. The main purpose of this visual support is twofold: to provide (1) support to a user to reduce their cognitive load (the amount of information working memory needs to process at any given time) and (2) an indication of the progress on the task that has been made thus far. A user may not be able to remember all preferences and/or constraints on recipes they selected thus far. A system that would require a user to do so, would likely not be experienced as very user-friendly. It is also nice to show a user how much progress has been made in finding a recipe they like. A simple measure for our recipe recommendation agent to indicate progress is to show how many recipes still match the preferences of the user.

...