Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

emowoz-public

  • Clone with SSH
  • Clone with HTTPS
  • Name Last commit Last update
    baselines
    data
    mturk
    .gitignore
    LICENSE
    README.md

    EmoWOZ

    This is the codebase for EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems.

    06/2023 Update: EmoWOZ is now available in Huggingface Datasets Repository. You can load EmoWOZ using the following code:

    from datasets import load_dataset
    
    # The second argument is by default 'emowoz', which loads all dialogues from subset. 
    # Use 'multiwoz' or 'dialmage' to load specific subsets.
    dataset = load_dataset("hhu-dsml/emowoz", 'emowoz')

    Data

    The dataset is kept on Zenodo. Please download the dataset and keep them in the data folder. EmoWOZ adopts the same format as MultiWOZ logs. We add an additional emotion field in each log item. The emotion contains annotations by three annotators, each identified by an anonymous 8-character global annotator id. The final field contains the final label obtained either from majority voting or manual resolution.

    All DialMAGE dialogues have a dialogue id in the form of ''DMAGExxx.json'' where xxx is a number. We provide dialog_act and span_info used to generate system responses in DialMAGE.

    The definition for each label is defined as below:

    Label Emotion Tokens Valence Elicitor Conduct
    0 Neutral Neutral Any Polite
    1 Fearful, sad, disappointed Negative Event/fact Polite
    2 Dissatisfied, disliking Negative Operator Polite
    3 Apologetic Negative User Polite
    4 Abusive Negative Operator Impolite
    5 Excited, happy, anticipating Positive Event/fact Polite
    6 Satisfied, liking Positive Operator Polite

    EmoWOZ dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International Public License and later. Therefore, this dataset should only be used for research purpose.

    Baseline Models

    To test the dataset with baseline models used in the paper, please follow instructions in each model folder of baselines/. The implementation of two models, baselines/COSMIC/ and baselines/DialogueRNN/, are taken and modified from https://github.com/declare-lab/conv-emotion. NOTE: when re-running the experiment some variance is to be expected in the numbers due to factors such as random seed and hardware specificiations. Some models are more sensitive to this than others.

    Requirements

    See requirements.txt. These are packages required for running all baseline models. Tested versions are listed below:

    • Python (tested: 3.7.8)
    • transformers (tested: 4.12.5)
    • torch (tested: 1.8.1)
    • pandas
    • sklearn
    • tqdm
    • nltk
    • ftfy
    • spacy
    • ipython
    • keras (tested: 2.7.0)
    • tensorflow (2.7.0)

    Citation

    If you use EmoWOZ in your own work, please cite our work as follows:

    @inproceedings{feng-etal-2022-emowoz,
        title = "{E}mo{WOZ}: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems",
        author = "Feng, Shutong  and
          Lubis, Nurul  and
          Geishauser, Christian  and
          Lin, Hsien-chin  and
          Heck, Michael  and
          van Niekerk, Carel  and
          Ga{\v{s}}i{\'c}, Milica",
        booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
        month = jun,
        year = "2022",
        address = "Marseille, France",
        publisher = "European Language Resources Association",
        url = "https://aclanthology.org/2022.lrec-1.436",
        pages = "4096--4113",
    }

    Contact

    Any questions or bug reports can be sent to shutong.feng@hhu.de