Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 4 Next »

Getting Started

  • go to main.py

giphy (3).webp

Ontology

What is an Ontology?

For general information please check out: Preliminaries and Quiz Materials.

For this project:

  1. Intents represent the user's high-level actions or goals.

    • Example: When a user says, "Can you recommend a recipe?", the intent could be requestRecommendation.

  2. Slots define specific pieces of information extracted from the user’s input.

    • Example: In the query "Add garlic and chicken thighs to the recipe filter," garlic and chicken thighs are slot values of the ingredient slot type.

The ontology file is where all the possible intents and slots for the system are defined.


Steps to Analyze the Ontology

  1. Open the Ontology File

    • Locate the ontology file (ontology.json) in your project. This file contains two key sections:

      • Intents: A list of all possible intents your system can predict.

      • Slots: A dictionary where the keys represent slot types (e.g., ingredient) and the values are lists of possible slot values (e.g., garlic, chicken thighs).

  2. Review the Intents

    • Look at the intents section in the file. Each intent represents a unique user goal or action.

    • Reflect on the variety of intents. For example:

      • What do intents like greeting or farewell imply about the system's capabilities?

      • How does the system distinguish between recipeRequest and requestRecommendation?

  3. Explore the Slots

    • Examine the slots section. This is a dictionary of slot types and their potential values.

    • Key questions to consider:

      • How many slot types are defined? Examples might include ingredient, cuisine, or recipe.

      • Are there any patterns in the slot values?

      • How do these slots connect to our MARBEL agent potentially?

  4. Think About Model Outputs

    • Your model will predict one intent per input (intent classification) and assign a slot label to each token in the input (slot filling).

    • Understanding the ontology helps you map these predictions to actionable output

Stuff to Think About

  1. Study the Ontology File

    • Open ontology.json and carefully review the intents and slots.

    • Make notes on any patterns, ambiguities, or gaps you observe.

  2. Answer the Following Questions

    • What are the most common intents in the file? Are there any that seem rarely used or overly specific?

    • What slot types and values do you think will be the most challenging for the model to predict? Why?

    • How does the structure of the ontology affect how you might design a dataset or interpret the model’s outputs?

Fitting Encoders

What Are Encoders?

Encoders translate text-based labels (like intents and slot types) into numerical values that the model can work with. This process standardizes the inputs and outputs, ensuring consistency across training, evaluation, and inference.

  1. Intent Label Encoder:

    • Maps each intent from the ontology (e.g., recipeRequest, greeting) to a unique number.

    • Used for intent classification.

  2. Slot Label Encoder:

    • Converts slot types and their corresponding BIO-format tags (B-slot, I-slot, O) into numbers.

    • Used for slot filling at the token level.

Steps in the Encoding Process

Take a close look at the fit_encoders function in dataset.py. It performs the following steps:

  1. Load the Ontology:

    • The function reads the ontology file to extract the list of intents and slot types:

  2. Fit the Intent Label Encoder:

    • The intent encoder assigns a unique numerical label to each intent in the ontology:

      intent_label_encoder.fit(intents)

    • Key Insight: This step ensures that intent classification produces outputs in a consistent format.

  3. Generate BIO Tags for Slots:

    • Slot tags are converted into BIO format:

      • B-{slot}: Beginning of a slot entity.

      • I-{slot}: Inside a slot entity.

      • O: Outside of any slot entity.

    • All slot tags are compiled into a single list:

      all_slot_tags = ['O'] + [f'B-{slot}' for slot in slots.keys()]
                         + [f'I-{slot}' for slot in slots.keys()]

    • These tags are then fitted to the slot encoder:

    • Why BIO Format?: This labeling scheme helps identify the boundaries of multi-token slot entities.

      • Think about why this could be important in our context and what slots could specifically benefit.

Dataset

Preproccessing Distribution Dataset

Train

Evaluate

Inference

  • No labels