...
Load the Ontology:
The function reads the ontology file to extract the list of intents and slot types:
Fit the Intent Label Encoder:
The intent encoder assigns a unique numerical label to each intent in the ontology:
intent_label_encoder.fit(intents)
Key Insight: This step ensures that intent classification produces outputs in a consistent format.
Generate BIO Tags for Slots:
Slot tags are converted into BIO format:
B-{slot}
: Beginning of a slot entity.I-{slot}
: Inside a slot entity.O
: Outside of any slot entity.
All slot tags are compiled into a single list:
Code Block all_slot_tags = ['O'] + [f'B-{slot}' for slot in slots.keys()] + [f'I-{slot}' for slot in slots.keys()]
These tags are then fitted to the slot encoder:
Why BIO Format?: This labeling scheme helps identify the boundaries of multi-token slot entities.
Think about why this could be important in our context and what slots could specifically benefit.
Info |
---|
Done? Proceed with Our Dataset . |