Updated release with link to debug data.

79f4f010 · Benjamin Ruppik · 9e614da2 · 79f4f010 · 79f4f010 · 79f4f010
Commit 79f4f010 authored 8 months ago by Benjamin Ruppik
--- a/.gitignore
+++ b/.gitignore
@@ -84,10 +84,12 @@ data/experiments/
 data/figures/
-data/term_extraction/experiments/
+data/plots/
+data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
 data/term_extraction/evaluation_files/
+data/term_extraction/experiments/
 data/term_extraction/predictions_files/
-data/term_extraction/BioTag_labels_via_tokenizer_offset/datasets_*/
 data/term_extraction/model_files/
 data/term_extraction/topological_features_vectorized/

--- a/README.md
+++ b/README.md
@@ -2,11 +2,11 @@
 ## Overview
-This public repository contains the code to our paper “Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”, published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
+This public repository contains the code to our paper [“Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction”](https://doi.org/10.18653/v1/2024.sigdial-1.31), published at the [25th Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan (SIGDIAL 2024)](https://2024.sigdial.org/).
 In this project, we demonstrate that contextual topological features derived from neighborhoods in a language model embedding space can be used to increase performance in a term extraction task on dialogue data over a baseline which only uses the language model embedding vectors as input.
-We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://aclanthology.org/2022.sigdial-1.53/), which is based on topological features derived from static word embeddings.
+We also compare with our earlier work, ["Dialogue Term Extraction using Transfer Learning and Topological Data Analysis"](https://doi.org/10.18653/v1/2022.sigdial-1.53), which is based on topological features derived from static word embeddings.
 ## Installation
@@ -102,6 +102,8 @@ To create the data for the term extraction task, follow the instructions given i
 The following is an example call to run the full term extraction training-prediction-evaluation pipeline for a single setup with default parameters.
 Note that for this to work, you need to have the correctly prepared data in the `data/` directory.
+For testing the pipeline in the `--toy_dataset_mode`, we provide preprocessed [debug data](https://doi.org/10.5281/zenodo.14035394) which needs to be place into the repository's `data/` directory.
 Run one of the following commands to start the pipeline on the toy data,
 with different feature types:

--- a/instructions_for_publishing_to_public_repository.md
+++ b/instructions_for_publishing_to_public_repository.md
+# Instructions for publishing to public repository
+Add the public repository to the remote list:
+```bash
+git remote add public https://oauth2:$PROJECT_ACCESS_TOKEN@gitlab.cs.uni-duesseldorf.de/general/dsml/tda4contextualembeddings-public.git
+get fetch public
+```
+Create an orphan branch from the current repository state, so that we can push the content without the history to the public repository:
+```bash
+git checkout --orphan temp_branch_for_public
+git add .
+git commit -m "Updated public release of tda_for_contextual_spaces project"
+```
+Push the content of the new history-less branch to the public repository into a separate branch:
+```bash
+git push public temp_branch:updated_public_code_release
+```
+Delete the temporary branch:
+```bash
+git checkout main
+git branch -D temp_branch_for_public
+```