From 64c9b6dbea788cd47deff2252d5b20d7f035f0f1 Mon Sep 17 00:00:00 2001 From: zqwerty <zhuq96@hotmail.com> Date: Mon, 29 Nov 2021 14:55:40 +0000 Subject: [PATCH] unified format allow multiple values by '|' sep token --- data/unified_datasets/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/data/unified_datasets/README.md b/data/unified_datasets/README.md index 82b43645..cad25dd8 100644 --- a/data/unified_datasets/README.md +++ b/data/unified_datasets/README.md @@ -97,6 +97,8 @@ We first introduce the unified format of `ontology` and `dialogues`. To transfor - `db_results`: (*dict*, system side, could be empty) - `$domain_name`: (*list* of *dict*) topk entities (each entity contains slot-value pairs) +Note that multiple descriptions/values are separated by `"|"`. + Other attributes are optional. Run `python check.py $dataset` in the `data/unified_datasets` directory to check the validation of processed dataset and get data statistics. -- GitLab