Content Comparison

...

Code Block

breakoutMode	wide

The repository is structured as follows:
intent_slot_classification_modelutils/
├── checkpoints/          # Directory for saving trained model checkpoints
├── data/                 # Directory containing data files (ontology, train/test datasets)
│   ├── ontology.json     # Ontology file containing intents, slots, and synonyms
│   ├── train.json        # Training dataset
│   ├── test.json         # Test dataset
│   └── synonyms.json     # Synonyms for slot normalization
├── data_processing.py    # Utilities for additional data preprocessing (if needed)
├── dataset.py            # Dataset preparation and preprocessing module
├── evaluation.py         # Model evaluation and metrics generation
├── mainrun_train_test.py               # Main script to run training, evaluation, and inference
├── model.py              # Defines the BERT-based model architecture
├── predict.py            # Inference module for predicting intents and slots
├── requirements.txt      # Python dependencies for the project
├── train.py              # Training module for the intent-slot classifier
└── utils.py              # Helper functions for argument parsing, slot normalization, and synonym resolution

...

Explanation of Key Modules

...

`run_train_test.py`

This is the central script for orchestrating the entire pipelineintent and slot classifier independently. It integrates data preparation, training, evaluation, and inference, all controlled via command-line arguments.

...

Argument	Type	Default	Description
`--ontology_path`	`str`	`./data/ontology.json`	Path to the ontology JSON file.
`--train_data`	`str`	`./data/train.json`	Path to the training dataset.
`--test_data`	`str`	`./data/test.json`	Path to the test dataset.
`--model_save_path`	`str`	`checkpoints/model_checkpoint.pt`	Path to save/load the trained model weights.
`--train_model`	`bool`	`False`	Train the model when this flag is set.
`--evaluate`	`bool`	`False`	Evaluate the model on the test dataset when this flag is set.
`--num_epochs`	`int`	`2`	Number of epochs for training.
`--batch_size`	`int`	`16`	Batch size for training.
`--learning_rate`	`float`	`5e-5`	Learning rate for the optimizer.
`--max_length`	`int`	`16`	Maximum sequence length for tokenization.
`--seed`	`int`	`42`	Random seed for reproducibility.
`--inference_text`	`str`	`None`	Text input for running inference.
`--show_dist`	`bool`	`False`	Show the intent and slot distribution in the dataset.
`--prep_data`	`bool`	`False`	Prepare data for training.

...

Use

1. Preparing Data

To preprocess and prepare the training and test data for the model:

...

python main.py --prep_data

2. Viewing Dataset Distribution

To analyze the distribution of intents and slots in the dataset:

Code Block
python mainrun_train_test.py --show_dist

3. Training the Model

...

Version	Old Version 18	New Version 19
Changes made by	Gardner, I.V. (Bella)	Gardner, I.V. (Bella)
Saved on	Jan 03, 2025	Jan 05, 2025

Versions Compared

Key

Explanation of Key Modules

`run_train_test.py`

Use

1. Preparing Data

...
python main.py --prep_data
2. Viewing Dataset Distribution
To analyze the distribution of intents and slots in the dataset:
Code Block
python mainrun_train_test.py --show_dist

3. Training the Model

Content Comparison

Versions Compared

Key

Explanation of Key Modules

run_train_test.py

Use

1. Preparing Data

...python main.py --prep_data 2. Viewing Dataset DistributionTo analyze the distribution of intents and slots in the dataset: Code Blockpython mainrun_train_test.py --show_dist

3. Training the Model

`run_train_test.py`

...
python main.py --prep_data
2. Viewing Dataset Distribution
To analyze the distribution of intents and slots in the dataset:
Code Block
python mainrun_train_test.py --show_dist