Skip to content
Snippets Groups Projects
Select Git revision
  • main
1 result

README.md

Blame
  • Michael Heck's avatar
    Michael Heck authored
    be2cedd1
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    Introduction

    Generalising dialogue state tracking (DST) to new data is especially challenging due to the strong reliance on abundant and fine-grained supervision during training. Sample sparsity, distributional shift and the occurrence of new concepts and topics frequently lead to severe performance degradation during inference. TripPy-R (pronounced "Trippier"), robust triple copy strategy DST, can use a training strategy to build extractive DST models without the need for fine-grained manual span labels ("spanless training"). Further, two novel input-level dropout methods mitigate the negative impact of sample sparsity. TripPy-R uses a new model architecture with a unified encoder that supports value as well as slot independence by leveraging the attention mechanism, making it zero-shot capable. The framework combines the strengths of triple copy strategy DST and value matching to benefit from complementary predictions without violating the principle of ontology independence. In our paper we demonstrate that an extractive DST model can be trained without manual span labels. Our architecture and training strategies improve robustness towards sample sparsity, new concepts and topics, leading to state-of-the-art performance on a range of benchmarks.

    Recent updates

    • 2024.09-17: Added SGD support
    • 2023.08.08: Initial commit

    How to run

    Two example scripts are provided for how to use TripPy-R.

    DO.example will train and evaluate a model with recommended settings with the default supervised training strategy.

    DO.example.spanless will train and evaluate a model with recommended settings with the novel spanless training strategy. The training consists of three steps: 1) Training a proto-DST that learns to tag the positions of queried subsequences in an input sequence. 2) Applying the proto-DST to tag the positions of slot-value occurrences in the training data. 3) Training the DST using the automatic labels produced by the previous step.

    See below table for expected performance per dataset and training strategy. Our scripts use the parameters that were used for experiments in our paper "Robust Dialogue State Tracking with Weak Supervision and Sparse Data". Thus, performance will be similar to the reported ones. For more challenging datasets with longer dialogues, better performance may be achieved by using the maximum sequence length of 512.

    Trouble-shooting

    When conducting spanless training, the training of the proto-DST (Step 1 of 3, see above) is rather sensitive to the training hyperparameters such as learning rate, warm-up ratio and max. number of epochs, as well as the random model initialization. We recommend the hyperparameters as listed in the example script above. If the proto-DST's tagging performance (Step 2 of 3) remains below expectations for one or more slots, try running the training with a different random initialization, i.e. pick a different random seed, while using the recommended hyperparameters.

    Datasets

    Supported datasets are:

    See the README file in 'data/' in the original TripPy repo for more details how to obtain and prepare the datasets for use in TripPy-R.

    The --task_name is

    • 'sim-m', for sim-M
    • 'sim-r', for sim-R
    • 'woz2', for WOZ 2.0
    • 'multiwoz21', for MultiWOZ 2.0-2.4
    • 'multiwoz21_legacy', for MultiWOZ 2.1 legacy version
    • 'unified', for ConvLab-3's unified data format

    With a sequence length of 180, you should expect the following average JGA:

    Dataset Normal training Spanless training
    MultiWOZ 2.0 51% tbd
    MultiWOZ 2.1 56% 55%
    MultiWOZ 2.1 legacy 56% 55%
    MultiWOZ 2.2 56% tbd
    MultiWOZ 2.3 62% tbd
    MultiWOZ 2.4 69% tbd
    sim-M 95% 95%
    sim-R 92% 92%
    WOZ 2.0 92% 91%

    Requirements

    • torch (tested: 1.12.1)
    • transformers (tested: 4.18.0)
    • tensorboardX (tested: 2.5.1)

    Citation

    This work is published as Robust Dialogue State Tracking with Weak Supervision and Sparse Data

    If you use TripPy-R in your own work, please cite our work as follows:

    @article{heck-etal-2022-robust,
        title = "Robust Dialogue State Tracking with Weak Supervision and Sparse Data",
        author = "Heck, Michael and Lubis, Nurul and van Niekerk, Carel and
                  Feng, Shutong and Geishauser, Christian and Lin, Hsien-Chin and Ga{\v{s}}i{\'c}, Milica",
        journal = "Transactions of the Association for Computational Linguistics",
        volume = "10",
        year = "2022",
        address = "Cambridge, MA",
        publisher = "MIT Press",
        url = "https://aclanthology.org/2022.tacl-1.68",
        doi = "10.1162/tacl_a_00513",
        pages = "1175--1192",
    }