- Confidence-based Acquisition Model for Efficient Self-supervised Active Learning
- Overview
- Installation Guide
- 1. Set up the directory structure
- 2. Install ConvLab-3
- 3. Install CAMELL
- Dataset Requirements
- Active Learning for Dialogue State Tracking (Ensemble Setup)
- Step 1: Dataset Preparation
- Step 2: Initialize Active Learning Process
- Step 3: Train Ensemble Models
- Step 4: Perform Inference
- Step 5: Training the Prediction Confidence Estimator
- Step 6: Active Learning Update Step
- Step 6a: Select Data Points for Labeling
- Step 6b: Update the Dataloaders
- Step 7: Repeat the Process
- Active Learning with Label Validation for Dialogue State Tracking (Ensemble)
- Step 1: Noisy Label Generation
- Step 2: Train the Ensemble Models
- Step 3: Perform Inference
- Step 5: Training the Label Confidence Estimator
- Step 6: Identifying noisy labels in the pool
- Baselines
- 1. Random Sampling
- 2. Bayesian Active Learning by Disagreement (BALD)
- 3. Diversity-based Active Learning (Diversity)
- Label Correction for Dialogue State Tracking using the Label Confidence Model
- Step 1: Train the SetSUMBT Ensemble
- Step 2: Generate Noisy Labels and Train the Noisy Model
- Step 3: Perform Inference with Both Base and Noisy Models
- Step 4: Train the Label Confidence Estimator
- Step 5: Identify Noisy Labels in the Pool
- Step 6: Correct the Labels
- Citation and Acknowledgments
Confidence-based Acquisition Model for Efficient Self-supervised Active Learning
Overview
Confidence-based Acquisition Model for Efficient Self-supervised Active Learning (CAMELL) is a pool-based active learning framework for dialogue state tracking, using the SetSUMBT model implementation from ConvLab-3. This repository facilitates efficient model training with minimal human-labeled data through active learning and confidence-based selection.
Installation Guide
To set up CAMELL, follow these steps to install the required dependencies and prepare the active learning ensemble.
1. Set up the directory structure
Create a directory structure for the project, with separate directories for ConvLab-3, CAMELL, and model training data.
export PROJECT_DIR=/path/to/your/directory/for/this/project
mkdir -p ${PROJECT_DIR}
cd ${PROJECT_DIR}
mkdir experiments
2. Install ConvLab-3
CAMELL relies on the SetSUMBT model from ConvLab-3. To set it up, clone the ConvLab-3 repository and install the required dependencies:
cd ${PROJECT_DIR}
git clone https://github.com/ConvLab/ConvLab-3.git ConvLab3
cd ConvLab3
uv sync
3. Install CAMELL
Next, clone the CAMELL repository and install the required dependencies using poetry. Ensure you have the poetry
package manager installed (we recommend Python 3.12 for compatibility).
git clone TODO CAMELL
cd CAMELL
uv sync
Dataset Requirements
CAMELL has been tested with the MultiWOZ dataset, commonly used in dialogue state tracking tasks and WMT17 German to English translation. To use CAMELL with other datasets, ensure the data is formatted correctly to fit the CAMELL code base data formats.
Active Learning for Dialogue State Tracking (Ensemble Setup)
Step 1: Dataset Preparation
Create the dataset object required for training the SetSUMBT model. This step is crucial for setting up your data before beginning the active learning process.
uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
--run_config_name setsumbt_multiwoz21 \
--do_train \
--do_test \
--num_train_epochs 0 \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble
Step 2: Initialize Active Learning Process
Before starting active learning, initialize the ensemble and prepare the model directory. Start by moving your initial model to a new directory for the seed step.
mv ${PROJECT_DIR}/experiments/seed_ensemble/dataloaders/train.dataloader ${PROJECT_DIR}/experiments/seed_ensemble/dataloaders/train_full.dataloader
Then, initialize the active learning ensemble by selecting a random seed set from the training data.
cd ${PROJECT_DIR}/CAMELL
uv run active-learning \
--initialise_active_learning_ensemble \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--seed 20211202 \
--seed_size 420
Step 3: Train Ensemble Models
Train each individual model in the ensemble.
cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
for i in {0..4}; do
uv run python run.py \
--run_config_name setsumbt_multiwoz21 \
--do_train \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble/ens-$i
done
Step 4: Perform Inference
After training the individual models, run inference on:
- The test set for evaluation.
- The unlabelled pool for the active learning election process.
- The training dataset (for CAMELL) to train the confidence estimator.
cd ${PROJECT_DIR}/CAMELL
uv run combine-loaders \
--ensemble_loaders \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble
uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_test \
--do_eval \
--do_eval_trainset \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble
mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble/train_labelled.data
uv run active-learning \
--create_pool_loader \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble
uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_eval_trainset \
--output_dir /path/to/your/model/seed
mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble/predictions/pool.data
mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train_labelled.data ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data
Step 5: Training the Prediction Confidence Estimator
Train the confidence estimator using the prediction confidence model. This is an essential part of CAMELL, enabling efficient selection of data points for human labeling.
cd ${PROJECT_DIR}/CAMELL
uv run confidence-selection --train_confidence_model --prediction_confidence --model_path ${PROJECT_DIR}/experiments/seed_ensemble
Step 6: Active Learning Update Step
Step 6a: Select Data Points for Labeling
Use the confidence estimator to select the dialogue-turn-slot pairs for labeling. In this example, we set the step size to 420, approximately 5% of the MultiWOZ training dataset.
cd ${PROJECT_DIR}/CAMELL
uv run confidence-selection \
--select_from_pool \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--eval_data_set pool \
--step_size 420
This command will generate a selected_points.json
file, containing the dialogue-turn-slot pairs that should be annotated.
Step 6b: Update the Dataloaders
Run the update step to update the dataloaders of the ensemble based on the newly selected data points.
cd ${PROJECT_DIR}/CAMELL
uv run update-step \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--ensemble_size 5
Step 7: Repeat the Process
Repeat the training, inference, and confidence selection steps until the desired number of dialogues have been selected and annotated. (Steps 3-6)
It is suggested to create a copy of the current model to use for the next repeat of the process.
cp -r ${PROJECT_DIR}/experiments/seed_ensemble ${PROJECT_DIR}/experiments/step-1
Active Learning with Label Validation for Dialogue State Tracking (Ensemble)
In addition to the active learning process described above, CAMELL also supports label validation to identify and reject noisy labels. This process involves generating noisy labels, training the ensemble models, and refining the predictions using the noisy data. Before updating the dataloaders (Step 6b), the label confidence estimation model is trained and used to identify noisy labels as outlined below.
Step 1: Noisy Label Generation
In this step, we generate a noisy dataset by creating labels from models that are trained on a portion of incorrect labels. This helps the ensemble identify label noise and refine its predictions later in the process.
cd ${PROJECT_DIR}/experiments
cp -r seed_ensemble seed_ensemble_noisy
cd ${PROJECT_DIR}/CAMELL
uv run generate-noisy-labels \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy \
--ensemble_size 5
Step 2: Train the Ensemble Models
Train the individual models in the ensemble, using both clean and noisy data. This step is critical for learning from noisy labels and creating robust predictions in later stages.
cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
for i in {0..4}; do
uv run python run.py \
--run_config_name setsumbt_multiwoz21 \
--do_train \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy/ens-$i
done
Step 3: Perform Inference
In the inference step, perform inference using both the base and noisy versions of the model. First, create a combined dataloader containing both noisy and clean data.
cd ${PROJECT_DIR}/CAMELL
uv run combine-loaders \
--ensemble_loaders \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy
uv run combine-loaders \
--noisy_loaders \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble
After training the individual models, run inference on:
- The test set for evaluation.
- The unlabelled pool for the active learning election process.
- The training dataset (for CAMELL) to train the confidence estimator.
cd ${PROJECT_DIR}/CAMELL
uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_test \
--do_eval \
--do_eval_trainset \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy
mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/train_labelled.data
uv run active-learning \
--create_pool_loader \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy
uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_eval_trainset \
--output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy
mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/pool.data
mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train_labelled.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data
Step 5: Training the Label Confidence Estimator
Train the confidence estimator using the label confidence model to identify noisy labels.
uv run confidence-selection --train_confidence_model --label_confidence --model_path ${PROJECT_DIR}/experiments/seed_ensemble
Step 6: Identifying noisy labels in the pool
Use the confidence estimator to identify noisy labels in the pool data.
cd ${PROJECT_DIR}/CAMELL
uv run confidence-selection \
--select_for_correction \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--eval_data_set pool
--eval_threshold 0.8
This will create a file called points_for_correction.json
containing the dialogue-turn-slot pairs with unreliable labels. Now, the dataloader update step from above (Step 6b) can be run.
Baselines
To benchmark CAMELL, we implement three baseline approaches: Random Sampling, Bayesian Active Learning by Disagreement (BALD), and Diversity-based Active Learning. Each baseline applies a different acquisition strategy for selecting data points for human labeling, enabling comparisons of efficiency and effectiveness.
1. Random Sampling
In Random Sampling, data points are selected purely at random, bypassing Steps 4, 5, and 6a in the Active Learning Process outlined above. This approach serves as a neutral baseline against which more sophisticated acquisition functions can be compared.
cd ${PROJECT_DIR}/CAMELL
uv run active-learning \
--select_from_pool \
--acquisition_function random \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--step_size 420
This command generates selected_points.json
, containing dialogue-turn-slot pairs for annotation, which can then be used to update the dataloaders in Step 6b.
2. Bayesian Active Learning by Disagreement (BALD)
The BALD baseline employs Bayesian uncertainty sampling, selecting points with high knowledge uncertainty as candidates for human labeling. This baseline provides a measure of how uncertainty-based selection compares to CAMELL’s confidence-based approach.
cd ${PROJECT_DIR}/CAMELL
uv run bald-with-ss \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--step_size 420
As with Random Sampling, this command outputs selected_points.json
for use in Step 6b.
3. Diversity-based Active Learning (Diversity)
Diversity-based Active Learning focuses on maximizing representational diversity in selected samples, based on data features. Here, diversity is measured using embeddings from a language model to ensure varied examples in the training set.
cd ${PROJECT_DIR}/CAMELL
uv run combine-loaders \
--ensemble_loaders \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble
uv run diversity-al \
--get_labelled_centroids \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--encoder_name_or_path "roberta-base"
uv run active-learning \
--create_pool_loader \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble
uv run diversity-al \
--select_from_pool \
--model_path ${PROJECT_DIR}/experiments/seed_ensemble \
--encoder_name_or_path "roberta-base" \
--step_size 420
This produces selected_points.json
, which can be used to update the dataloaders in Step 6b.
Label Correction for Dialogue State Tracking using the Label Confidence Model
Label correction is a crucial step in CAMELL, enhancing model robustness by identifying and correcting noisy labels. A trained SetSUMBT model is required, which can be obtained using CAMELL’s active learning approach or another baseline. Follow these steps to prepare a SetSUMBT ensemble with label correction.
Step 1: Train the SetSUMBT Ensemble
Prepare and train an ensemble of SetSUMBT models on the dataset, creating dataloaders for ensemble-based active learning and label correction.
cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
uv run python run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_ensemble_setup \
--output_dir ${PROJECT_DIR}/experiments/full_ensemble
Train each model in the ensemble:
cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
for i in {0..4}; do
uv run python run.py \
--run_config_name setsumbt_multiwoz21 \
--do_train \
--output_dir ${PROJECT_DIR}/experiments/full_ensemble/ens-$i
done
Step 2: Generate Noisy Labels and Train the Noisy Model
Generate a noisy dataset by training models with a portion of incorrect labels, allowing the ensemble to identify label noise and improve robustness.
cd ${PROJECT_DIR}/CAMELL
cp -r ${PROJECT_DIR}/experiments/full_ensemble ${PROJECT_DIR}/experiments/full_ensemble_noisy
uv run generate-noisy-labels \
--model_path ${PROJECT_DIR}/experiments/full_ensemble_noisy \
--ensemble_size 5
Train each model in the ensemble:
cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
for i in {0..4}; do
uv run python run.py \
--run_config_name setsumbt_multiwoz21 \
--do_train \
--output_dir ${PROJECT_DIR}/experiments/full_ensemble_noisy/ens-$i
done
Step 3: Perform Inference with Both Base and Noisy Models
Run inference using both base and noisy models to assess predictions on noisy labels and clean data, establishing a basis for label confidence estimation.
cd ${PROJECT_DIR}/CAMELL
uv run combine-loaders \
--ensemble_loaders \
--model_path ${PROJECT_DIR}/experiments/full_ensemble_noisy
uv run combine-loaders \
--ensemble_loaders \
--model_path ${PROJECT_DIR}/experiments/full_ensemble
uv run combine-loaders \
--noisy_loaders \
--model_path ${PROJECT_DIR}/experiments/full_ensemble
cd ../ConvLab3/convlab/dst/setsumbt
uv run python run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_eval \
--do_eval_trainset \
--output_dir ${PROJECT_DIR}/experiments/full_ensemble_noisy
uv run python run.py \
--run_config_name ensemble_setsumbt_multiwoz21 \
--do_eval \
--do_eval_trainset \
--output_dir ${PROJECT_DIR}/experiments/full_ensemble
Step 4: Train the Label Confidence Estimator
Train a label confidence estimator, enabling identification and correction of noisy labels.
cd ${PROJECT_DIR}/CAMELL
uv run confidence-selection --train_confidence_model --label_confidence --model_path ${PROJECT_DIR}/experiments/full_ensemble
Step 5: Identify Noisy Labels in the Pool
Using the label confidence estimator, identify noisy labels in the pool data by setting an evaluation threshold.
cd ${PROJECT_DIR}/CAMELL
uv run confidence-selection \
--select_for_correction \
--model_path ${PROJECT_DIR}/experiments/full_ensemble \
--eval_data_set train \
--eval_threshold 0.8
This generates points_for_correction.json
, listing dialogue-turn-slot pairs with noisy labels.
Step 6: Correct the Labels
Run label correction on identified noisy labels, with the option to create updated dataloaders for retraining.
cd ${PROJECT_DIR}/CAMELL
uv run label-correction \
--correct_labels \
--model_path ${PROJECT_DIR}/experiments/full_ensemble
To correct labels directly in the dataloaders for retraining a new SetSUMBT model, use --create_dataloaders
.
Citation and Acknowledgments
If you use CAMELL in your research, please cite the associated paper as follows:
@article{vanniekerk2025camell, title={A confidence-based acquisition model for self-supervised active learning and label correction}, author={van Niekerk, Carel and Geishauser, Christian and Heck, Michael and Feng, Shutong and Lin, Hsien-chin and Lubis, Nurul and Ruppik, Benjamin and Vukovic, Renato and Ga{\v{s}}i{'c}, Milica}, journal={Transactions of the Association for Computational Linguistics}, volume={13}, pages={167--187}, year={2025}, publisher={MIT Press 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA~…} }
CAMELL builds upon multiple open-source tools, including:
- ConvLab-3 for dialogue model implementations
- MultiWOZ dataset as the primary dataset for testing on the DST tasks
- WMT17 German to English translation dataset for testing on the translation tasks
Acknowledgments go to the authors of these tools and datasets for enabling the development of this framework.