This public repository contains the code to our paper “Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”, published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
This public repository contains the code to our paper [“Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”](https://doi.org/10.18653/v1/2024.sigdial-1.31), published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
In this project, we demonstrate that contextual topological features derived from neighborhoods in a language model embedding space can be used to increase performance in a term extraction task on dialogue data over a baseline which only uses the language model embedding vectors as input.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://aclanthology.org/2022.sigdial-1.53/), which is based on topological features derived from static word embeddings.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://doi.org/10.18653/v1/2022.sigdial-1.53), which is based on topological features derived from static word embeddings.
## Installation
...
...
@@ -102,6 +102,8 @@ To create the data for the term extraction task, follow the instructions given i
The following is an example call to run the full term extraction training-prediction-evaluation pipeline for a single setup with default parameters.
Note that for this to work, you need to have the correctly prepared data in the `data/` directory.
For testing the pipeline in the `--toy_dataset_mode`, we provide preprocessed [debug data](https://doi.org/10.5281/zenodo.14035394) which needs to be place into the repository's `data/` directory.
Run one of the following commands to start the pipeline on the toy data,