Skip to content
Snippets Groups Projects
Select Git revision
  • main
1 result

README.md

Blame
  • user avatar
    Carel van Niekerk authored
    27039a8c
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    Confidence-based Acquisition Model for Efficient Self-supervised Active Learning

    Overview

    Confidence-based Acquisition Model for Efficient Self-supervised Active Learning (CAMELL) is a pool-based active learning framework for dialogue state tracking, using the SetSUMBT model implementation from ConvLab-3. This repository facilitates efficient model training with minimal human-labeled data through active learning and confidence-based selection.

    Installation Guide

    To set up CAMELL, follow these steps to install the required dependencies and prepare the active learning ensemble.

    1. Set up the directory structure

    Create a directory structure for the project, with separate directories for ConvLab-3, CAMELL, and model training data.

    export PROJECT_DIR=/path/to/your/directory/for/this/project
    mkdir -p ${PROJECT_DIR}
    
    cd ${PROJECT_DIR}
    mkdir experiments

    2. Install ConvLab-3

    CAMELL relies on the SetSUMBT model from ConvLab-3. To set it up, clone the ConvLab-3 repository and install the required dependencies:

    cd ${PROJECT_DIR}
    git clone https://github.com/ConvLab/ConvLab-3.git ConvLab3
    cd ConvLab3
    uv sync

    3. Install CAMELL

    Next, clone the CAMELL repository and install the required dependencies using poetry. Ensure you have the poetry package manager installed (we recommend Python 3.12 for compatibility).

    git clone TODO CAMELL
    cd CAMELL
    uv sync

    Dataset Requirements

    CAMELL has been tested with the MultiWOZ dataset, commonly used in dialogue state tracking tasks and WMT17 German to English translation. To use CAMELL with other datasets, ensure the data is formatted correctly to fit the CAMELL code base data formats.

    Active Learning for Dialogue State Tracking (Ensemble Setup)

    Step 1: Dataset Preparation

    Create the dataset object required for training the SetSUMBT model. This step is crucial for setting up your data before beginning the active learning process.

    uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
        --run_config_name setsumbt_multiwoz21 \
        --do_train \
        --do_test \
        --num_train_epochs 0 \
        --output_dir ${PROJECT_DIR}/experiments/seed_ensemble

    Step 2: Initialize Active Learning Process

    Before starting active learning, initialize the ensemble and prepare the model directory. Start by moving your initial model to a new directory for the seed step.

    mv ${PROJECT_DIR}/experiments/seed_ensemble/dataloaders/train.dataloader ${PROJECT_DIR}/experiments/seed_ensemble/dataloaders/train_full.dataloader

    Then, initialize the active learning ensemble by selecting a random seed set from the training data.

    cd ${PROJECT_DIR}/CAMELL
    uv run active-learning \
        --initialise_active_learning_ensemble \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --seed 20211202 \
        --seed_size 420

    Step 3: Train Ensemble Models

    Train each individual model in the ensemble.

    cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
    for i in {0..4}; do
        uv run python run.py \
            --run_config_name setsumbt_multiwoz21 \
            --do_train \
            --output_dir ${PROJECT_DIR}/experiments/seed_ensemble/ens-$i
    done

    Step 4: Perform Inference

    After training the individual models, run inference on:

    1. The test set for evaluation.
    2. The unlabelled pool for the active learning election process.
    3. The training dataset (for CAMELL) to train the confidence estimator.
    cd ${PROJECT_DIR}/CAMELL
    uv run combine-loaders \
        --ensemble_loaders \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble
    
    uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_test \
        --do_eval \
        --do_eval_trainset \
        --output_dir ${PROJECT_DIR}/experiments/seed_ensemble
    
    mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble/train_labelled.data
    
    uv run active-learning \
        --create_pool_loader \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble
    
    uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_eval_trainset \
        --output_dir /path/to/your/model/seed
    
    mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble/predictions/pool.data
    mv ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train_labelled.data ${PROJECT_DIR}/experiments/seed_ensemble/predictions/train.data

    Step 5: Training the Prediction Confidence Estimator

    Train the confidence estimator using the prediction confidence model. This is an essential part of CAMELL, enabling efficient selection of data points for human labeling.

    cd ${PROJECT_DIR}/CAMELL
    uv run confidence-selection --train_confidence_model --prediction_confidence --model_path ${PROJECT_DIR}/experiments/seed_ensemble

    Step 6: Active Learning Update Step

    Step 6a: Select Data Points for Labeling

    Use the confidence estimator to select the dialogue-turn-slot pairs for labeling. In this example, we set the step size to 420, approximately 5% of the MultiWOZ training dataset.

    cd ${PROJECT_DIR}/CAMELL
    uv run confidence-selection \
        --select_from_pool \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --eval_data_set pool \
        --step_size 420

    This command will generate a selected_points.json file, containing the dialogue-turn-slot pairs that should be annotated.

    Step 6b: Update the Dataloaders

    Run the update step to update the dataloaders of the ensemble based on the newly selected data points.

    cd ${PROJECT_DIR}/CAMELL
    uv run update-step \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --ensemble_size 5

    Step 7: Repeat the Process

    Repeat the training, inference, and confidence selection steps until the desired number of dialogues have been selected and annotated. (Steps 3-6)

    It is suggested to create a copy of the current model to use for the next repeat of the process.

    cp -r ${PROJECT_DIR}/experiments/seed_ensemble ${PROJECT_DIR}/experiments/step-1

    Active Learning with Label Validation for Dialogue State Tracking (Ensemble)

    In addition to the active learning process described above, CAMELL also supports label validation to identify and reject noisy labels. This process involves generating noisy labels, training the ensemble models, and refining the predictions using the noisy data. Before updating the dataloaders (Step 6b), the label confidence estimation model is trained and used to identify noisy labels as outlined below.

    Step 1: Noisy Label Generation

    In this step, we generate a noisy dataset by creating labels from models that are trained on a portion of incorrect labels. This helps the ensemble identify label noise and refine its predictions later in the process.

    cd ${PROJECT_DIR}/experiments
    cp -r seed_ensemble seed_ensemble_noisy
    
    cd ${PROJECT_DIR}/CAMELL
    uv run generate-noisy-labels \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy \
        --ensemble_size 5

    Step 2: Train the Ensemble Models

    Train the individual models in the ensemble, using both clean and noisy data. This step is critical for learning from noisy labels and creating robust predictions in later stages.

    cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
    for i in {0..4}; do
        uv run python run.py \
            --run_config_name setsumbt_multiwoz21 \
            --do_train \
            --output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy/ens-$i
    done

    Step 3: Perform Inference

    In the inference step, perform inference using both the base and noisy versions of the model. First, create a combined dataloader containing both noisy and clean data.

    cd ${PROJECT_DIR}/CAMELL
    uv run combine-loaders \
        --ensemble_loaders \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy
    
    uv run combine-loaders \
        --noisy_loaders \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble

    After training the individual models, run inference on:

    1. The test set for evaluation.
    2. The unlabelled pool for the active learning election process.
    3. The training dataset (for CAMELL) to train the confidence estimator.
    cd ${PROJECT_DIR}/CAMELL
    uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_test \
        --do_eval \
        --do_eval_trainset \
        --output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy
    
    mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/train_labelled.data
    
    uv run active-learning \
        --create_pool_loader \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble_noisy
    
    uv run python ../ConvLab3/convlab/dst/setsumbt/run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_eval_trainset \
        --output_dir ${PROJECT_DIR}/experiments/seed_ensemble_noisy
    
    mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/pool.data
    mv ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train_labelled.data ${PROJECT_DIR}/experiments/seed_ensemble_noisy/predictions/train.data

    Step 5: Training the Label Confidence Estimator

    Train the confidence estimator using the label confidence model to identify noisy labels.

    uv run confidence-selection --train_confidence_model --label_confidence --model_path ${PROJECT_DIR}/experiments/seed_ensemble

    Step 6: Identifying noisy labels in the pool

    Use the confidence estimator to identify noisy labels in the pool data.

    cd ${PROJECT_DIR}/CAMELL
    uv run confidence-selection \
        --select_for_correction \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --eval_data_set pool
        --eval_threshold 0.8

    This will create a file called points_for_correction.json containing the dialogue-turn-slot pairs with unreliable labels. Now, the dataloader update step from above (Step 6b) can be run.

    Baselines

    To benchmark CAMELL, we implement three baseline approaches: Random Sampling, Bayesian Active Learning by Disagreement (BALD), and Diversity-based Active Learning. Each baseline applies a different acquisition strategy for selecting data points for human labeling, enabling comparisons of efficiency and effectiveness.

    1. Random Sampling

    In Random Sampling, data points are selected purely at random, bypassing Steps 4, 5, and 6a in the Active Learning Process outlined above. This approach serves as a neutral baseline against which more sophisticated acquisition functions can be compared.

    cd ${PROJECT_DIR}/CAMELL
    uv run active-learning \
        --select_from_pool \
        --acquisition_function random \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --step_size 420

    This command generates selected_points.json, containing dialogue-turn-slot pairs for annotation, which can then be used to update the dataloaders in Step 6b.

    2. Bayesian Active Learning by Disagreement (BALD)

    The BALD baseline employs Bayesian uncertainty sampling, selecting points with high knowledge uncertainty as candidates for human labeling. This baseline provides a measure of how uncertainty-based selection compares to CAMELL’s confidence-based approach.

    cd ${PROJECT_DIR}/CAMELL
    uv run bald-with-ss \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --step_size 420

    As with Random Sampling, this command outputs selected_points.json for use in Step 6b.

    3. Diversity-based Active Learning (Diversity)

    Diversity-based Active Learning focuses on maximizing representational diversity in selected samples, based on data features. Here, diversity is measured using embeddings from a language model to ensure varied examples in the training set.

    cd ${PROJECT_DIR}/CAMELL
    uv run combine-loaders \
        --ensemble_loaders \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble
    
    uv run diversity-al \
        --get_labelled_centroids \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --encoder_name_or_path "roberta-base"
    
    uv run active-learning \
        --create_pool_loader \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble
    
    uv run diversity-al \
        --select_from_pool \
        --model_path ${PROJECT_DIR}/experiments/seed_ensemble \
        --encoder_name_or_path "roberta-base" \
        --step_size 420

    This produces selected_points.json, which can be used to update the dataloaders in Step 6b.

    Label Correction for Dialogue State Tracking using the Label Confidence Model

    Label correction is a crucial step in CAMELL, enhancing model robustness by identifying and correcting noisy labels. A trained SetSUMBT model is required, which can be obtained using CAMELL’s active learning approach or another baseline. Follow these steps to prepare a SetSUMBT ensemble with label correction.

    Step 1: Train the SetSUMBT Ensemble

    Prepare and train an ensemble of SetSUMBT models on the dataset, creating dataloaders for ensemble-based active learning and label correction.

    cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
    uv run python run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_ensemble_setup \
        --output_dir ${PROJECT_DIR}/experiments/full_ensemble

    Train each model in the ensemble:

    cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
    for i in {0..4}; do
        uv run python run.py \
            --run_config_name setsumbt_multiwoz21 \
            --do_train \
            --output_dir ${PROJECT_DIR}/experiments/full_ensemble/ens-$i
    done

    Step 2: Generate Noisy Labels and Train the Noisy Model

    Generate a noisy dataset by training models with a portion of incorrect labels, allowing the ensemble to identify label noise and improve robustness.

    cd ${PROJECT_DIR}/CAMELL
    
    cp -r ${PROJECT_DIR}/experiments/full_ensemble ${PROJECT_DIR}/experiments/full_ensemble_noisy
    
    uv run generate-noisy-labels \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble_noisy \
        --ensemble_size 5

    Train each model in the ensemble:

    cd ${PROJECT_DIR}/ConvLab3/convlab/dst/setsumbt
    for i in {0..4}; do
        uv run python run.py \
            --run_config_name setsumbt_multiwoz21 \
            --do_train \
            --output_dir ${PROJECT_DIR}/experiments/full_ensemble_noisy/ens-$i
    done

    Step 3: Perform Inference with Both Base and Noisy Models

    Run inference using both base and noisy models to assess predictions on noisy labels and clean data, establishing a basis for label confidence estimation.

    cd ${PROJECT_DIR}/CAMELL
    uv run combine-loaders \
        --ensemble_loaders \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble_noisy
    
    uv run combine-loaders \
        --ensemble_loaders \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble
    
    uv run combine-loaders \
        --noisy_loaders \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble
    
    cd ../ConvLab3/convlab/dst/setsumbt
    uv run python run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_eval \
        --do_eval_trainset \
        --output_dir ${PROJECT_DIR}/experiments/full_ensemble_noisy
    
    uv run python run.py \
        --run_config_name ensemble_setsumbt_multiwoz21 \
        --do_eval \
        --do_eval_trainset \
        --output_dir ${PROJECT_DIR}/experiments/full_ensemble

    Step 4: Train the Label Confidence Estimator

    Train a label confidence estimator, enabling identification and correction of noisy labels.

    cd ${PROJECT_DIR}/CAMELL
    uv run confidence-selection --train_confidence_model --label_confidence --model_path ${PROJECT_DIR}/experiments/full_ensemble

    Step 5: Identify Noisy Labels in the Pool

    Using the label confidence estimator, identify noisy labels in the pool data by setting an evaluation threshold.

    cd ${PROJECT_DIR}/CAMELL
    uv run confidence-selection \
        --select_for_correction \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble \
        --eval_data_set train \
        --eval_threshold 0.8

    This generates points_for_correction.json, listing dialogue-turn-slot pairs with noisy labels.

    Step 6: Correct the Labels

    Run label correction on identified noisy labels, with the option to create updated dataloaders for retraining.

    cd ${PROJECT_DIR}/CAMELL
    uv run label-correction \
        --correct_labels \
        --model_path ${PROJECT_DIR}/experiments/full_ensemble

    To correct labels directly in the dataloaders for retraining a new SetSUMBT model, use --create_dataloaders.

    Citation and Acknowledgments

    If you use CAMELL in your research, please cite the associated paper as follows:

    @article{vanniekerk2025camell, title={A confidence-based acquisition model for self-supervised active learning and label correction}, author={van Niekerk, Carel and Geishauser, Christian and Heck, Michael and Feng, Shutong and Lin, Hsien-chin and Lubis, Nurul and Ruppik, Benjamin and Vukovic, Renato and Ga{\v{s}}i{'c}, Milica}, journal={Transactions of the Association for Computational Linguistics}, volume={13}, pages={167--187}, year={2025}, publisher={MIT Press 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA~…} }

    CAMELL builds upon multiple open-source tools, including:

    • ConvLab-3 for dialogue model implementations
    • MultiWOZ dataset as the primary dataset for testing on the DST tasks
    • WMT17 German to English translation dataset for testing on the translation tasks

    Acknowledgments go to the authors of these tools and datasets for enabling the development of this framework.