Dialogue Term Extraction using Transfer Learning and Topological Data Analysis
This is the code for the paper Dialogue Term Extraction using Transfer Learning and Topological Data Analysis.
1. Data
We use the Multi-WOZ 2.1 Data-set and the Schema-Guided Dialogue data-set for which the preprocessed datasets we use can be found in the data folder.
2. Requirements
Install requirements using:
pip install -r requirements.txt
3. BIO-tagging data
All the needed files for training and testing are given in the data folder.
4. MLM scores
Compute the MLM scores by running:
python data/prep/get_MLM_scores.py --dataset ["multiwoz"|"SGD"]
5. Topological features
See the tda
directory for instructions on how to generate the word vector embeddings,
neighborhoods and persistence features.
6. Models
The modelscripts for the MLM scores and the three TDA features (Persistence image vectors, Codensity and Wasserstein norm) are in the models directory.
7. Training
There are training scripts for training on Multi-WOZ and SGD respectively, and they are simply executed using:
python -m training.training_script --train_on ["multiwoz"|"SGD"]
8. Evaluation
Compute the prediction for each model using the get_tags scripts in the evaluation directory. Then evaluate the predictions using the evaluation script or the evaluation notebook.
python -m evaluation.prediction_script --model_trained_on ["multiwoz"|"SGD"] --predictions_on ["multiwoz"|"SGD"]
Check the README for the evaluation/evalscript for info how to run it.
License
This project is licensed under the Apache License, Version 2.0 (the "License"); you may not use the files except in compliance with the License. You may obtain a copy of the License at