Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The NLU task can be approached as joint learning of intent classification (IC) and slot filling (SF), with the slot labels typically formatted in the widely-used BIO format, as shown below. In general, joint learning of intent and slot classification models are mutually beneficial.

...

Utterance

...

I

...

want

...

to

...

cook

...

Italian

...

pizza

...

Slot

...

O

...

O

...

O

...

O

...

B-ingredienttype

...

I-ingredienttype

...

Intent

...

addFilter

...

Here is an example of SF and IC output for an utterance. Slot labels are in BIO format: B indicates the start of a slot span, I the inside of a span while O denotes that the word does not belong to any slot.

Utterance

I

want

to

cook

Italian

pizza

Slot

O

O

O

O

B-ingredienttype

I-ingredienttype

Intent

addFilter

The NLU architecture include includes the following key parts:

  • Base Model: Pre-trained BertModel (e.g.,bert-base-uncased) for generating contextual embeddings. It includes two main parts: the encoder and the attention mechanism. The encoder processes the input sequence and creates contextual embeddings for each token, while the attention mechanism helps capture dependencies between words, regardless of their position in the sequence.

  • Intent Classifier: A linear layer on top of the [CLS] token output for intent prediction. The final output of this layer is typically a softmax function, which predicts the probability distribution over a predefined set of possible intents.

  • Slot Classifier: A linear layer applied to the token-level embeddings for slot tagging. It assigns a label to each token, indicating whether it represents a particular entity (e.g., a destination, date, etc). This process is often referred to as token tagging. The output of this linear layer is typically a softmax layer that predicts slot labels for each token.

  • Joint Learning of the Two Classifiers: During training, the model minimizes a combined loss function, which includes separate losses for intent classification and slot filling. This ensures that the model not only accurately predicts the intent but also extracts the correct slots for each token in the sentence.

...