-
Benjamin Ruppik authoredBenjamin Ruppik authored
TDA code for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis
This is the Topological Data Analysis portion of the code for the paper 'Dialogue Term Extraction using Transfer Learning and Topological Data Analysis'.
The scripts in this folder should be executed in the tda
working directory.
Create embeddings
Precomputed sbert embeddings are contained in the /data
folder
for the ambient fastText vocabulary, and the joint MultiWOZ and SGD vocabulary.
These embeddings are the basis for computing neighborhoods.
It is not necessary to recompute these embeddings,
for the neighborhood extraction and TDA features skip ahead to the next section.
The following command loads the precomputed embeddings of the fastText vocabulary into an interactive python session:
python -i sbert_create_static_embeddings.py \
--embeddings_config_path ./sbert_static_embeddings_config_50_0.yaml \
--vocab_desc pretrained_cc_en \
--load_embeddings
To compute and save embeddings of the multiwoz and sgd vocabulary:
python sbert_create_static_embeddings.py \
--embeddings_config_path ./sbert_static_embeddings_config_50_0.yaml \
--vocab_desc multiwoz_and_sgd \
--save_embeddings
Build neighborhoods and extract persistence features
The jupyter notebook sbert_ambient_static_neighborhoods_create_persistence_images.ipynb
guides through the installation of the TDA dependencies,
creation of neighborhoods, computation of persistence features via ripser,
and creation of persistence images.
Along the way, the embedding space and the neighborhoods can be visualized
via 2-dimensional t-SNE projections.
License
This project is licensed under the Apache License, Version 2.0 (the "License"); you may not use the files except in compliance with the License. You may obtain a copy of the License at