Skip to content
Snippets Groups Projects
Commit 612fc8bb authored by Benjamin Ruppik's avatar Benjamin Ruppik
Browse files

Added saved tSNE projection; edits to TDA jupyter notebook; added example logfile

parent 8d845873
Branches
Tags
No related merge requests found
...@@ -12,7 +12,7 @@ for which the preprocessed datasets we use can be found in the data folder. ...@@ -12,7 +12,7 @@ for which the preprocessed datasets we use can be found in the data folder.
## 2. Requirements ## 2. Requirements
Install requirements using: Install requirements using:
```bash ```bash
pip install -r requirements.txt python3 -m pip install -r requirements.txt
``` ```
## 3. BIO-tagging data ## 3. BIO-tagging data
......
Source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
...@@ -8,7 +8,7 @@ The scripts in this folder should be executed in the `tda` working directory. ...@@ -8,7 +8,7 @@ The scripts in this folder should be executed in the `tda` working directory.
## Create embeddings ## Create embeddings
Precomputed sbert embeddings are contained in the `/data` folder Precomputed sbert embeddings are contained in the `/data` folder
for the ambient fastText vocabulary, and the joint multiwoz and sgd vocabulary. for the ambient fastText vocabulary, and the joint MultiWOZ and SGD vocabulary.
These embeddings are the basis for computing neighborhoods. These embeddings are the basis for computing neighborhoods.
It is not necessary to recompute these embeddings, It is not necessary to recompute these embeddings,
for the neighborhood extraction and TDA features skip ahead to the next section. for the neighborhood extraction and TDA features skip ahead to the next section.
...@@ -29,9 +29,14 @@ python sbert_create_static_embeddings.py \ ...@@ -29,9 +29,14 @@ python sbert_create_static_embeddings.py \
--save_embeddings --save_embeddings
``` ```
## Build neighbourhoods and extract persistence features ## Build neighborhoods and extract persistence features
TODO The jupyter notebook `sbert_ambient_static_neighborhoods_create_persistence_images.ipynb`
guides through the installation of the TDA dependencies,
creation of neighborhoods, computation of persistence features via ripser,
and creation of persistence images.
Along the way, the embedding space and the neighborhoods can be visualized
via 2-dimensional t-SNE projections.
## License ## License
This project is licensed under the Apache License, Version 2.0 (the "License"); This project is licensed under the Apache License, Version 2.0 (the "License");
......
This diff is collapsed.
2022-06-15 19:48:55,892 - root - INFO - Loading config file: ./sbert_static_embeddings_config_50_0.yaml
2022-06-15 19:48:55,892 - root - INFO - {'data': {'data_folder_path': '../data', 'multiwoz_and_sgd_vocabulary_path': '../data/multiwoz_and_sgd_joint_vocabulary.json', 'pretrained_cc_en_vocabulary_path': '../data/pretrained_cc_en_vocabulary.json'}, 'embeddings': {'embeddings_dict_path': '../data', 'embeddings_dataframes_path': '../data', 'context': 'word', 'pooling_method': 'mean', 'special_tokens': 'ignore'}, 'neighborhoods': {'nbhd_size': 50, 'nbhd_remove': 0, 'neighborhoods_path': '../data/neighborhoods', 'persistence_features_path': '../data', 'normalize': False}}
2022-06-15 19:49:31,656 - root - INFO - Loading embeddings ...
2022-06-15 19:49:31,659 - root - INFO - Loading from ../data/pretrained_cc_en_vocab_embeddings_sbert.pkl
2022-06-15 19:49:40,435 - root - INFO - Loading from ../data/multiwoz_and_sgd_vocab_embeddings_sbert.pkl
2022-06-15 19:50:03,547 - root - INFO - Loading embeddings ...
2022-06-15 19:50:03,549 - root - INFO - Loading from ../data/pretrained_cc_en_vocab_embeddings_sbert.pkl
2022-06-15 19:50:13,914 - root - INFO - Loading from ../data/multiwoz_and_sgd_vocab_embeddings_sbert.pkl
2022-06-15 19:50:14,028 - root - INFO - Loading embeddings DONE
2022-06-15 19:56:26,877 - root - INFO - Loading config file: ./sbert_static_embeddings_config_50_0.yaml
2022-06-15 19:56:26,878 - root - INFO - {'data': {'data_folder_path': '../data', 'multiwoz_and_sgd_vocabulary_path': '../data/multiwoz_and_sgd_joint_vocabulary.json', 'pretrained_cc_en_vocabulary_path': '../data/pretrained_cc_en_vocabulary.json'}, 'embeddings': {'embeddings_dict_path': '../data', 'embeddings_dataframes_path': '../data', 'context': 'word', 'pooling_method': 'mean', 'special_tokens': 'ignore'}, 'neighborhoods': {'nbhd_size': 50, 'nbhd_remove': 0, 'neighborhoods_path': '../data/neighborhoods', 'persistence_features_path': '../data', 'normalize': False}}
2022-06-15 19:56:33,739 - root - INFO - Loading embeddings ...
2022-06-15 19:56:33,741 - root - INFO - Loading from ../data/pretrained_cc_en_vocab_embeddings_sbert.pkl
2022-06-15 19:56:41,616 - root - INFO - Loading from ../data/multiwoz_and_sgd_vocab_embeddings_sbert.pkl
2022-06-15 19:56:41,759 - root - INFO - Loading embeddings DONE
2022-06-15 20:03:06,914 - root - INFO - Saving t-SNE to ../data/./paraphrase-MiniLM-L6-v2_ambient_embeddings/_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 20:03:24,349 - root - INFO - Saving t-SNE to ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings/_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 20:03:37,151 - root - INFO - Saving t-SNE to ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 20:04:23,718 - root - INFO - Saving t-SNE to ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 20:04:28,493 - root - INFO - Saving t-SNE to ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 20:04:37,216 - root - INFO - Saving t-SNE to ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 21:02:31,134 - root - INFO - Setting number of cpus to 8
2022-06-15 21:04:29,298 - root - INFO - Setting number of cpus to 8
2022-06-15 21:04:42,197 - root - INFO - Setting number of cpus to 8
2022-06-15 21:07:42,858 - root - INFO - Loading neighborhoods from ../data/paraphrase-MiniLM-L6-v2_ambient_embeddings_vocab_50000_sbert_static_tsne_df.pkl
2022-06-15 21:08:22,834 - root - INFO - Loading neighborhoods from <_io.BufferedReader name='../data/neighborhoods/neighborhoods_ambient_static_50000_sbert_size_50_remove_0_normalize_False.pkl'>
2022-06-15 21:08:43,095 - root - INFO - Loading neighborhoods ...
2022-06-15 21:09:00,052 - root - INFO - Loading neighborhoods ...
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment