Training Your Model

There are function(s) to be completed in this section!

Due to how the imports are set up and because the intent and slot classifier is part of the social-interaction-cloud python package when you make changes you need to make sure to reinstall the social interaction cloud via pip install .

In this part of the assignment, you’ll complete the train_model function in train.py, which trains a BERT-based model for intent classification and slot filling. This task will deepen your understanding of the training process, how loss functions work for multi-task learning, and how to use backpropagation to optimize a model.

Under utils folder make a folder called checkpoints for your checkpoints.

Do not add your model-checkpoint to github, do not try and push it. It is too big. Add it to your gitignore file or skip it when committing. At the end of the course we will tell you how to upload everything.

Understanding Training

For more general and preliminary information check out: Preliminaries and Quiz Materials.

What is Training?
- Training is the process of optimizing a model to make better predictions by minimizing a loss function.
- For your BERT-based model:
  - Intent Classification: Predict the intent of a user query (e.g., addFilter or recipeRequest).
  - Slot Filling: Assign BIO tags to each token in the query (e.g., B-shortTimeKeyWord for "fast").
How Does the Training Loop Work?
- Forward Pass: The input data flows through the model to generate predictions.
- Loss Calculation: The predictions are compared with ground truth labels to calculate the loss.
- Backward Pass (Backpropagation): The model adjusts its weights based on the loss to improve predictions in the next iteration.
What Are We Optimizing?
- Intent Loss: Measures how well the model predicts the intent.
- Slot Loss: Measures how well the model predicts slot tags for each token.
Why Do We Use Two Loss Functions?
- Your model performs two tasks simultaneously, so you need separate losses for each task. These are combined to train the model in a balanced way.

Steps to Complete the `train_model` Function

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.

Assignment: Completing the Training Function

Objective

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.

Understanding the Missing Slots

Loss Functions
- Two loss functions are needed:
  - Intent Loss: Measures how well the model predicts the intent.
  - Slot Loss: Measures how well the model tags tokens for slots using BIO tags.
- Hints:
  - Think about which loss functions are commonly used for classification tasks.
  - Both tasks deal with categorical data, so the loss functions should compare predicted probabilities with actual labels.
  - Look into PyTorch’s Loss Functions documentation for insights. (Super Hint: our preferred loss function is in the first 5 loss functions on this documentation list)
Questions to Consider:
- How do loss functions compute the difference between predicted and true labels?
- Can the same type of loss function work for both intents and slots? Why?

Optimizer
- The optimizer updates the model’s weights to minimize the loss.
- Hints:
  - The optimizer needs access to the model’s parameters.
  - A learning rate is crucial—too high might cause erratic updates; too low might slow down training.
  - Explore PyTorch’s Optimization documentation to understand popular optimizers.
Questions to Consider:
- Why do we use an optimizer instead of manually adjusting weights?
- How does the learning rate impact training?

Forward Pass
- The forward pass feeds the input data into the model to generate predictions:
  - intent_logits: Predicted logits for intent classification.
  - slot_logits: Predicted logits for slot tagging.
- Hints:
  - The model expects tokenized inputs and an attention mask.
  - Think about what the model outputs when given inputs for two tasks.
Questions to Consider:
- What are logits, and how are they different from probabilities?
- How does the model simultaneously handle intent and slot predictions?

Loss Calculation
- After the forward pass, compute the losses:
  - Intent Loss: Compare intent_logits with intent_labels.
  - Slot Loss: Compare slot_logits with slot_labels.
- Hints:
  - slot_logits and slot_labels may need reshaping to align their dimensions for the loss function.
  - Combine both losses into a single loss for backpropagation. Think about how best to combine them… this is arguably a hyperparameter.
Questions to Consider:
- Why might slot logits need to be flattened before calculating the loss?
- How does combining intent and slot losses impact model training?

Backpropagation
- Backpropagation updates the model weights based on the calculated loss.
- Steps:
  - Zero the gradients using optimizer.zero_grad() to prevent accumulation.
  - Use loss.backward() to compute gradients.
  - Apply the gradients to update the weights with optimizer.step().
Questions to Consider:
- Why do we need to zero the gradients before backpropagation?
- What happens if you skip the optimizer.step() step?

Loss Logging
- After updating the weights, print the loss for each epoch or batch to monitor training progress.
- Hints:
  - Loss values should decrease over epochs if the model is learning effectively.
  - Keep track of both intent and slot losses separately to ensure balanced training.
Questions to Consider:
- What does it mean if the loss doesn’t decrease?
- Why is it helpful to log intent and slot losses separately?

Hints for Success

Loss Functions: Look into loss functions designed for classification tasks.
Reshaping Slot Logits: Flatten the logits and labels to make them compatible with the loss function.
Optimizer: The Adam optimizer is a common choice for deep learning models.

By completing this assignment, you’ll gain hands-on experience with key components of model training, preparing you for more advanced tasks. Take your time, think critically, and ask for clarification if needed!

Reflection Questions: Check out the Questions to Consider in each section.