/
Fitting Label Encoders

Fitting Label Encoders

What Are Label Encoders?

Label encoders translate text-based labels (like intents and slot types) into numerical values that the model can work with. This process standardizes the inputs and outputs, ensuring consistency across training, evaluation, and inference.

  1. Intent Label Encoder:

    • Maps each intent from the ontology (e.g., recipeRequest, greeting) to a unique number.

    • Used for intent classification.

  2. Slot Label Encoder:

    • Converts slot types and their corresponding BIO-format tags (B-slot, I-slot, O) into numbers.

    • Used for slot filling at the token level.

Steps in the Encoding Process

Take a close look at the fit_encoders function in dataset.py. It performs the following steps:

  1. Load the Ontology:

    • The function reads the ontology file to extract the list of intents and slot types:

  2. Fit the Intent Label Encoder:

    • The intent encoder assigns a unique numerical label to each intent in the ontology:

      intent_label_encoder.fit(intents)

    • Key Insight: This step ensures that intent classification produces outputs in a consistent format.

  3. Generate BIO Tags for Slots:

    • Slot tags are converted into BIO format:

      • B-{slot}: Beginning of a slot entity.

      • I-{slot}: Inside a slot entity.

      • O: Outside of any slot entity.

    • All slot tags are compiled into a single list:

      all_slot_tags = ['O'] + [f'B-{slot}' for slot in slots.keys()] + [f'I-{slot}' for slot in slots.keys()]

       

    • These tags are then fitted to the slot encoder:

    • Why BIO Format?: This labeling scheme helps identify the boundaries of multi-token slot entities.

      • Think about why this could be important in our context and what slots could specifically benefit.

Done? Proceed with Preparing the Dataset .

Related content