Skip to content
Snippets Groups Projects
Select Git revision
  • 612fc8bb124093682cd9de642326abbc60cbfcf4
  • master default protected
  • v1.0.0
3 results

README.md

Blame
  • Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    TDA code for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

    This is the Topological Data Analysis portion of the code for the paper 'Dialogue Term Extraction using Transfer Learning and Topological Data Analysis'.

    The scripts in this folder should be executed in the tda working directory.

    Create embeddings

    Precomputed sbert embeddings are contained in the /data folder for the ambient fastText vocabulary, and the joint MultiWOZ and SGD vocabulary. These embeddings are the basis for computing neighborhoods. It is not necessary to recompute these embeddings, for the neighborhood extraction and TDA features skip ahead to the next section.

    The following command loads the precomputed embeddings of the fastText vocabulary into an interactive python session:

    python -i sbert_create_static_embeddings.py \
      --embeddings_config_path ./sbert_static_embeddings_config_50_0.yaml \
      --vocab_desc pretrained_cc_en \
      --load_embeddings

    To compute and save embeddings of the multiwoz and sgd vocabulary:

    python sbert_create_static_embeddings.py \
      --embeddings_config_path ./sbert_static_embeddings_config_50_0.yaml \
      --vocab_desc multiwoz_and_sgd \
      --save_embeddings

    Build neighborhoods and extract persistence features

    The jupyter notebook sbert_ambient_static_neighborhoods_create_persistence_images.ipynb guides through the installation of the TDA dependencies, creation of neighborhoods, computation of persistence features via ripser, and creation of persistence images. Along the way, the embedding space and the neighborhoods can be visualized via 2-dimensional t-SNE projections.

    License

    This project is licensed under the Apache License, Version 2.0 (the "License"); you may not use the files except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0