Skip to content
Snippets Groups Projects
Commit 6bd3becc authored by Benjamin Ruppik's avatar Benjamin Ruppik
Browse files

Merge branch 'updated_public_code_release' into 'main'

Updated release with link to debug data.

See merge request !2
parents 9e614da2 79f4f010
No related branches found
No related tags found
1 merge request!2Updated release with link to debug data.
......@@ -84,10 +84,12 @@ data/experiments/
data/figures/
data/term_extraction/experiments/
data/plots/
data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
data/term_extraction/evaluation_files/
data/term_extraction/experiments/
data/term_extraction/predictions_files/
data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
data/term_extraction/model_files/
data/term_extraction/topological_features_vectorized/
......
......@@ -2,11 +2,11 @@
## Overview
This public repository contains the code to our paper “Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”, published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
This public repository contains the code to our paper [“Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”](https://doi.org/10.18653/v1/2024.sigdial-1.31), published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
In this project, we demonstrate that contextual topological features derived from neighborhoods in a language model embedding space can be used to increase performance in a term extraction task on dialogue data over a baseline which only uses the language model embedding vectors as input.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://aclanthology.org/2022.sigdial-1.53/), which is based on topological features derived from static word embeddings.
We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://doi.org/10.18653/v1/2022.sigdial-1.53), which is based on topological features derived from static word embeddings.
## Installation
......@@ -102,6 +102,8 @@ To create the data for the term extraction task, follow the instructions given i
The following is an example call to run the full term extraction training-prediction-evaluation pipeline for a single setup with default parameters.
Note that for this to work, you need to have the correctly prepared data in the `data/` directory.
For testing the pipeline in the `--toy_dataset_mode`, we provide preprocessed [debug data](https://doi.org/10.5281/zenodo.14035394) which needs to be place into the repository's `data/` directory.
Run one of the following commands to start the pipeline on the toy data,
with different feature types:
......
# Instructions for publishing to public repository
Add the public repository to the remote list:
```bash
git remote add public https://oauth2:$PROJECT_ACCESS_TOKEN@gitlab.cs.uni-duesseldorf.de/general/dsml/tda4contextualembeddings-public.git
get fetch public
```
Create an orphan branch from the current repository state, so that we can push the content without the history to the public repository:
```bash
git checkout --orphan temp_branch_for_public
git add .
git commit -m "Updated public release of tda_for_contextual_spaces project"
```
Push the content of the new history-less branch to the public repository into a separate branch:
```bash
git push public temp_branch:updated_public_code_release
```
Delete the temporary branch:
```bash
git checkout main
git branch -D temp_branch_for_public
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment