Training Your Model

There are function(s) to be completed in this section!

In this part of the assignment, you’ll complete the train_model function in train.py, which trains a BERT-based model for intent classification and slot filling. This task will deepen your understanding of the training process, how loss functions work for multi-task learning, and how to use backpropagation to optimize a model.

Understanding Training

For more general and preliminary information check out: [TBC]Preliminaries and Quiz Materials.

What is Training?
- Training is the process of optimizing a model to make better predictions by minimizing a loss function.
- For your BERT-based model:
  - Intent Classification: Predict the intent of a user query (e.g., addFilter or recipeRequest).
  - Slot Filling: Assign BIO tags to each token in the query (e.g., B-shortTimeKeyWord for "fast").
How Does the Training Loop Work?
- Forward Pass: The input data flows through the model to generate predictions.
- Loss Calculation: The predictions are compared with ground truth labels to calculate the loss.
- Backward Pass (Backpropagation): The model adjusts its weights based on the loss to improve predictions in the next iteration.
What Are We Optimizing?
- Intent Loss: Measures how well the model predicts the intent.
- Slot Loss: Measures how well the model predicts slot tags for each token.
Why Do We Use Two Loss Functions?
- Your model performs two tasks simultaneously, so you need separate losses for each task. These are combined to train the model in a balanced way.

Steps to Complete the `train_model` Function

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.

Assignment: Completing the Training Function

Objective

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.

Understanding the Missing Slots

Loss Functions
- Two loss functions are needed:
  - Intent Loss: Measures how well the model predicts the intent.
  - Slot Loss: Measures how well the model tags tokens for slots using BIO tags.
- Hints:
  - Think about which loss functions are commonly used for classification tasks.
  - Both tasks deal with categorical data, so the loss functions should compare predicted probabilities with actual labels.
  - Look into PyTorch’s Loss Functions documentation for insights. (Super Hint: our preferred loss function is in the first 5 loss functions on this documentation list)
Questions to Consider:
- How do loss functions compute the difference between predicted and true labels?
- Can the same type of loss function work for both intents and slots? Why?

Optimizer
- The optimizer updates the model’s weights to minimize the loss.
- Hints:
  - The optimizer needs access to the model’s parameters.
  - A learning rate is crucial—too high might cause erratic updates; too low might slow down training.
  - Explore PyTorch’s Optimization documentation to understand popular optimizers.
Questions to Consider:
- Why do we use an optimizer instead of manually adjusting weights?
- How does the learning rate impact training?

Forward Pass
- The forward pass feeds the input data into the model to generate predictions:
  - intent_logits: Predicted logits for intent classification.
  - slot_logits: Predicted logits for slot tagging.
- Hints:
  - The model expects tokenized inputs and an attention mask.
  - Think about what the model outputs when given inputs for two tasks.
Questions to Consider:
- What are logits, and how are they different from probabilities?
- How does the model simultaneously handle intent and slot predictions?

Loss Calculation
- After the forward pass, compute the losses:
  - Intent Loss: Compare intent_logits with intent_labels.
  - Slot Loss: Compare slot_logits with slot_labels.
- Hints:
  - slot_logits and slot_labels may need reshaping to align their dimensions for the loss function.
  - Combine both losses into a single loss for backpropagation. Think about how best to combine them… this is arguably a hyperparameter.
Questions to Consider:
- Why might slot logits need to be flattened before calculating the loss?
- How does combining intent and slot losses impact model training?

Backpropagation
- Backpropagation updates the model weights based on the calculated loss.
- Steps:
  - Zero the gradients using optimizer.zero_grad() to prevent accumulation.
  - Use loss.backward() to compute gradients.
  - Apply the gradients to update the weights with optimizer.step().
Questions to Consider:
- Why do we need to zero the gradients before backpropagation?
- What happens if you skip the optimizer.step() step?

Loss Logging
- After updating the weights, print the loss for each epoch or batch to monitor training progress.
- Hints:
  - Loss values should decrease over epochs if the model is learning effectively.
  - Keep track of both intent and slot losses separately to ensure balanced training.
Questions to Consider:
- What does it mean if the loss doesn’t decrease?
- Why is it helpful to log intent and slot losses separately?

Hints for Success

Loss Functions: Look into loss functions designed for classification tasks.
Reshaping Slot Logits: Flatten the logits and labels to make them compatible with the loss function.
Optimizer: The Adam optimizer is a common choice for deep learning models.

By completing this assignment, you’ll gain hands-on experience with key components of model training, preparing you for more advanced tasks. Take your time, think critically, and ask for clarification if needed!

Reflection Questions: Check out the Questions to Consider in each section.

Done? Proceed with Model Evaluation.