Skip to content
Snippets Groups Projects
Commit 79f4f010 authored by Benjamin Ruppik's avatar Benjamin Ruppik
Browse files

Updated release with link to debug data.

parent 9e614da2
No related branches found
No related tags found
1 merge request!2Updated release with link to debug data.
......@@ -84,10 +84,12 @@ data/experiments/
data/figures/
data/term_extraction/experiments/
data/plots/
data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
data/term_extraction/evaluation_files/
data/term_extraction/experiments/
data/term_extraction/predictions_files/
data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
data/term_extraction/model_files/
data/term_extraction/topological_features_vectorized/
......
......@@ -2,11 +2,11 @@
## Overview
This public repository contains the code to our paper “Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”, published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
This public repository contains the code to our paper [“Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”](https://doi.org/10.18653/v1/2024.sigdial-1.31), published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
In this project, we demonstrate that contextual topological features derived from neighborhoods in a language model embedding space can be used to increase performance in a term extraction task on dialogue data over a baseline which only uses the language model embedding vectors as input.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://aclanthology.org/2022.sigdial-1.53/), which is based on topological features derived from static word embeddings.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://doi.org/10.18653/v1/2022.sigdial-1.53), which is based on topological features derived from static word embeddings.
## Installation
......@@ -102,6 +102,8 @@ To create the data for the term extraction task, follow the instructions given i
The following is an example call to run the full term extraction training-prediction-evaluation pipeline for a single setup with default parameters.
Note that for this to work, you need to have the correctly prepared data in the `data/` directory.
For testing the pipeline in the `--toy_dataset_mode`, we provide preprocessed [debug data](https://doi.org/10.5281/zenodo.14035394) which needs to be place into the repository's `data/` directory.
Run one of the following commands to start the pipeline on the toy data,
with different feature types:
......
# Instructions for publishing to public repository
Add the public repository to the remote list:
```bash
git remote add public https://oauth2:$PROJECT_ACCESS_TOKEN@gitlab.cs.uni-duesseldorf.de/general/dsml/tda4contextualembeddings-public.git
get fetch public
```
Create an orphan branch from the current repository state, so that we can push the content without the history to the public repository:
```bash
git checkout --orphan temp_branch_for_public
git add .
git commit -m "Updated public release of tda_for_contextual_spaces project"
```
Push the content of the new history-less branch to the public repository into a separate branch:
```bash
git push public temp_branch:updated_public_code_release
```
Delete the temporary branch:
```bash
git checkout main
git branch -D temp_branch_for_public
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment