Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
breakoutModewide
The repository is structured as follows:
intent_slot_classification_modelutils/
├── checkpoints/          # Directory for saving trained model checkpoints
├── data/                 # Directory containing data files (ontology, train/test datasets)
│   ├── ontology.json     # Ontology file containing intents, slots, and synonyms
│   ├── train.json        # Training dataset
│   ├── test.json         # Test dataset
│   └── synonyms.json     # Synonyms for slot normalization
├── data_processing.py    # Utilities for additional data preprocessing (if needed)
├── dataset.py            # Dataset preparation and preprocessing module
├── evaluation.py         # Model evaluation and metrics generation
├── mainrun_train_test.py               # Main script to run training, evaluation, and inference
├── model.py              # Defines the BERT-based model architecture
├── predict.py            # Inference module for predicting intents and slots
├── requirements.txt      # Python dependencies for the project
├── train.py              # Training module for the intent-slot classifier
└── utils.py              # Helper functions for argument parsing, slot normalization, and synonym resolution

...

Explanation of Key Modules

...

run_train_test.py

This is the central script for orchestrating the entire pipelineintent and slot classifier independently. It integrates data preparation, training, evaluation, and inference, all controlled via command-line arguments.

...

Argument

Type

Default

Description

--ontology_path

str

./data/ontology.json

Path to the ontology JSON file.

--train_data

str

./data/train.json

Path to the training dataset.

--test_data

str

./data/test.json

Path to the test dataset.

--model_save_path

str

checkpoints/model_checkpoint.pt

Path to save/load the trained model weights.

--train_model

bool

False

Train the model when this flag is set.

--evaluate

bool

False

Evaluate the model on the test dataset when this flag is set.

--num_epochs

int

2

Number of epochs for training.

--batch_size

int

16

Batch size for training.

--learning_rate

float

5e-5

Learning rate for the optimizer.

--max_length

int

16

Maximum sequence length for tokenization.

--seed

int

42

Random seed for reproducibility.

--inference_text

str

None

Text input for running inference.

--show_dist

bool

False

Show the intent and slot distribution in the dataset.

--prep_data

bool

False

Prepare data for training.

...

Use

1. Preparing Data

To preprocess and prepare the training and test data for the model:

...

...

python main.py --prep_data 

2. Viewing Dataset Distribution

To analyze the distribution of intents and slots in the dataset:

Code Block
python mainrun_train_test.py --show_dist 

3. Training the Model

...