Note |
---|
Evaluation statistics must meet the thresholds described here. If they |
Objective
In evaluate.py
one can see an evaluate
function. The evaluation step helps us understand how well the trained model performs on the test dataset. It provides insights into the model's strengths and weaknesses by calculating key metrics for intent classification and slot-filling tasks. These metrics are critical for ensuring the model meets performance expectations.
...
Measure Performance:
The metrics highlight whether the model performs well enough to be deployed for practical use.
The intent accuracy and slot weighted F1-score provide a quick snapshot of overall model performance.
Identify Weaknesses:
The detailed classification reports help pinpoint specific intents or slots where the model struggles.
This information can guide further training, such as focusing on underperforming labels or collecting more data for rare cases.
The returned metrics (
intent_accuracy
andslot_weighted_f1
) allow for automated testing against performance thresholds in systems like GitHub Classroom.We consider _ and _ acceptable minimum scores for your model. This is what a base, un-tuned, unrefined model got for us on little training.
...
How to Interpret the Results
...
Info |
---|
Done? Proceed with [TODO]Connecting Your Classifier with the PipelineWHISPER. |