Skip to content
Snippets Groups Projects
Select Git revision
  • 4d4d0ac8f0ba734c30b4191f2af8f2e2d4cc2068
  • master default protected
  • emoUS
  • add_default_vectorizer_and_pretrained_loading
  • clean_code
  • readme
  • issue127
  • generalized_action_dicts
  • ppo_num_dialogues
  • crossowoz_ddpt
  • issue_114
  • robust_masking_feature
  • scgpt_exp
  • e2e-soloist
  • convlab_exp
  • change_system_act_in_env
  • pre-training
  • nlg-scgpt
  • remapping_actions
  • soloist
20 results

multiwoz21

  • Open with
  • Download source code
  • Download directory
  • Your workspaces

      A workspace is a virtual sandbox environment for your code in GitLab.

      No agents available to create workspaces. Please consult Workspaces documentation for troubleshooting.

  • user avatar
    zqwerty authored
    4d4d0ac8
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    Dataset Card for MultiWOZ 2.1

    Dataset Summary

    MultiWOZ 2.1 fixed the noise in state annotations and dialogue utterances. It also includes user dialogue acts from ConvLab (Lee et al., 2019) as well as multiple slot descriptions per dialogue state slot.

    • How to get the transformed data from original data:
    • Main changes of the transformation:
      • Create a new ontology in the unified format, taking slot descriptions from MultiWOZ 2.2.
      • Correct some grammar errors in the text, mainly following tokenization.md in MultiWOZ_2.1.
      • Normalize slot name and value. See normalize_domain_slot_value function in preprocess.py.
      • Correct some non-categorical slots' values and provide character level span annotation.
      • Concatenate multiple values in user goal & state using |.
    • Annotations:
      • user goal, dialogue acts, state.

    Supported Tasks and Leaderboards

    NLU, DST, Policy, NLG, E2E, User simulator

    Languages

    English

    Data Splits

    split dialogues utterances avg_utt avg_tokens avg_domains cat slot match(state) cat slot match(goal) cat slot match(dialogue act) non-cat slot span(dialogue act)
    train 8438 113556 13.46 13.23 3.39 98.84 99.48 86.39 98.22
    validation 1000 14748 14.75 13.5 3.64 98.84 99.46 86.59 98.17
    test 1000 14744 14.74 13.5 3.59 99.21 99.32 85.83 98.58
    all 10438 143048 13.7 13.28 3.44 98.88 99.47 86.36 98.25

    9 domains: ['attraction', 'hotel', 'taxi', 'restaurant', 'train', 'police', 'hospital', 'booking', 'general']

    • cat slot match: how many values of categorical slots are in the possible values of ontology.
    • non-cat slot span: how many values of non-categorical slots have span annotation.

    Citation

    @inproceedings{eric-etal-2020-multiwoz,
        title = "{M}ulti{WOZ} 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines",
        author = "Eric, Mihail and Goel, Rahul and Paul, Shachi and Sethi, Abhishek and Agarwal, Sanchit and Gao, Shuyag and Hakkani-Tur, Dilek",
        booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
        month = may,
        year = "2020",
        address = "Marseille, France",
        publisher = "European Language Resources Association",
        url = "https://aclanthology.org/2020.lrec-1.53",
        pages = "422--428",
        ISBN = "979-10-95546-34-4",
    }

    Licensing Information

    Apache License, Version 2.0