Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 5 Next »

There are function(s) to be completed in this section!

In this part of the assignment, you’ll complete the train_model function in train.py, which trains a BERT-based model for intent classification and slot filling. This task will deepen your understanding of the training process, how loss functions work for multi-task learning, and how to use backpropagation to optimize a model.


Understanding Training

For more general and preliminary information check out: [TBC]Preliminaries and Quiz Materials.

  1. What is Training?

    • Training is the process of optimizing a model to make better predictions by minimizing a loss function.

    • For your BERT-based model:

      • Intent Classification: Predict the intent of a user query (e.g., addFilter or recipeRequest).

      • Slot Filling: Assign BIO tags to each token in the query (e.g., B-shortTimeKeyWord for "fast").

  2. How Does the Training Loop Work?

    • Forward Pass: The input data flows through the model to generate predictions.

    • Loss Calculation: The predictions are compared with ground truth labels to calculate the loss.

    • Backward Pass (Backpropagation): The model adjusts its weights based on the loss to improve predictions in the next iteration.

  3. What Are We Optimizing?

    • Intent Loss: Measures how well the model predicts the intent.

    • Slot Loss: Measures how well the model predicts slot tags for each token.

  4. Why Do We Use Two Loss Functions?

    • Your model performs two tasks simultaneously, so you need separate losses for each task. These are combined to train the model in a balanced way.


Steps to Complete the train_model Function

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.


Assignment: Completing the Training Function


Objective

The train_model function is incomplete, and your task is to fill in the missing pieces. This function is the backbone of the training process for a dual-task model that handles intent classification and slot filling. Through this exercise, you’ll learn how to integrate loss functions, perform forward passes, and optimize a model’s weights effectively.


Understanding the Missing Slots

  1. Loss Functions

    • Two loss functions are needed:

      • Intent Loss: Measures how well the model predicts the intent.

      • Slot Loss: Measures how well the model tags tokens for slots using BIO tags.

    • Hints:

      • Think about which loss functions are commonly used for classification tasks.

      • Both tasks deal with categorical data, so the loss functions should compare predicted probabilities with actual labels.

      • Look into PyTorch’s Loss Functions documentation for insights. (Super Hint: our preferred loss function is in the first 5 loss functions on this documentation list)

    Questions to Consider:

    • How do loss functions compute the difference between predicted and true labels?

    • Can the same type of loss function work for both intents and slots? Why?


  1. Optimizer

    • The optimizer updates the model’s weights to minimize the loss.

    • Hints:

      • The optimizer needs access to the model’s parameters.

      • A learning rate is crucial—too high might cause erratic updates; too low might slow down training.

      • Explore PyTorch’s Optimization documentation to understand popular optimizers.

    Questions to Consider:

    • Why do we use an optimizer instead of manually adjusting weights?

    • How does the learning rate impact training?


  1. Forward Pass

    • The forward pass feeds the input data into the model to generate predictions:

      • intent_logits: Predicted logits for intent classification.

      • slot_logits: Predicted logits for slot tagging.

    • Hints:

      • The model expects tokenized inputs and an attention mask.

      • Think about what the model outputs when given inputs for two tasks.

    Questions to Consider:

    • What are logits, and how are they different from probabilities?

    • How does the model simultaneously handle intent and slot predictions?


  1. Loss Calculation

    • After the forward pass, compute the losses:

      • Intent Loss: Compare intent_logits with intent_labels.

      • Slot Loss: Compare slot_logits with slot_labels.

    • Hints:

      • slot_logits and slot_labels may need reshaping to align their dimensions for the loss function.

      • Combine both losses into a single loss for backpropagation. Think about how best to combine them… this is arguably a hyperparameter.

    Questions to Consider:

    • Why might slot logits need to be flattened before calculating the loss?

    • How does combining intent and slot losses impact model training?


  1. Backpropagation

    • Backpropagation updates the model weights based on the calculated loss.

    • Steps:

      • Zero the gradients using optimizer.zero_grad() to prevent accumulation.

      • Use loss.backward() to compute gradients.

      • Apply the gradients to update the weights with optimizer.step().

    Questions to Consider:

    • Why do we need to zero the gradients before backpropagation?

    • What happens if you skip the optimizer.step() step?


  1. Loss Logging

    • After updating the weights, print the loss for each epoch or batch to monitor training progress.

    • Hints:

      • Loss values should decrease over epochs if the model is learning effectively.

      • Keep track of both intent and slot losses separately to ensure balanced training.

    Questions to Consider:

    • What does it mean if the loss doesn’t decrease?

    • Why is it helpful to log intent and slot losses separately?


Hints for Success

  • Loss Functions: Look into loss functions designed for classification tasks.

  • Reshaping Slot Logits: Flatten the logits and labels to make them compatible with the loss function.

  • Optimizer: The Adam optimizer is a common choice for deep learning models.


By completing this assignment, you’ll gain hands-on experience with key components of model training, preparing you for more advanced tasks. Take your time, think critically, and ask for clarification if needed!


Reflection Questions: Check out the Questions to Consider in each section.

Done? Proceed with Model Evaluation.

  • No labels