...
Code Block | ||
---|---|---|
| ||
The repository is structured as follows: intent_slot_classification_modelutils/ ├── checkpoints/ # Directory for saving trained model checkpoints ├── data/ # Directory containing data files (ontology, train/test datasets) │ ├── ontology.json # Ontology file containing intents, slots, and synonyms │ ├── train.json # Training dataset │ ├── test.json # Test dataset │ └── synonyms.json # Synonyms for slot normalization ├── data_processing.py # Utilities for additional data preprocessing (if needed) ├── dataset.py # Dataset preparation and preprocessing module ├── evaluation.py # Model evaluation and metrics generation ├── mainrun_train_test.py # Main script to run training, evaluation, and inference ├── model.py # Defines the BERT-based model architecture ├── predict.py # Inference module for predicting intents and slots ├── requirements.txt # Python dependencies for the project ├── train.py # Training module for the intent-slot classifier └── utils.py # Helper functions for argument parsing, slot normalization, and synonym resolution |
...
Explanation of Key Modules
...
run_train_test.py
This is the central script for orchestrating the entire pipelineintent and slot classifier independently. It integrates data preparation, training, evaluation, and inference, all controlled via command-line arguments.
...
Argument | Type | Default | Description |
---|---|---|---|
|
|
| Path to the ontology JSON file. |
|
|
| Path to the training dataset. |
|
|
| Path to the test dataset. |
|
|
| Path to save/load the trained model weights. |
|
|
| Train the model when this flag is set. |
|
|
| Evaluate the model on the test dataset when this flag is set. |
|
|
| Number of epochs for training. |
|
|
| Batch size for training. |
|
|
| Learning rate for the optimizer. |
|
|
| Maximum sequence length for tokenization. |
|
|
| Random seed for reproducibility. |
|
|
| Text input for running inference. |
|
|
| Show the intent and slot distribution in the dataset. |
|
|
| Prepare data for training. |
...
Use
1. Preparing Data
To preprocess and prepare the training and test data for the model:
...
...
python main.py --prep_data
2. Viewing Dataset Distribution
To analyze the distribution of intents and slots in the dataset:
Code Block |
---|
python mainrun_train_test.py --show_dist |
3. Training the Model
...