From ba27f92d43fc81f514baa303dfb915d795d1f4eb Mon Sep 17 00:00:00 2001
From: zqwerty <zhuq96@hotmail.com>
Date: Wed, 22 Dec 2021 08:35:31 +0000
Subject: [PATCH] update unified dataset readme

---
 data/unified_datasets/README.md | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/data/unified_datasets/README.md b/data/unified_datasets/README.md
index 082eb2a6..76a320c3 100644
--- a/data/unified_datasets/README.md
+++ b/data/unified_datasets/README.md
@@ -1,6 +1,6 @@
 # Unified data format
 
-## Overview
+## Usage
 We transform different datasets into a unified format under `data/unified_datasets` directory. To import a unified datasets:
 
 ```python
@@ -13,6 +13,18 @@ database = load_database('multiwoz21')
 
 `dataset` is a dict where the keys are data splits and the values are lists of dialogues. `database` is an instance of `Database` class that has a `query` function. The format of dialogue, ontology, and Database are defined below.
 
+We provide a function `load_unified_data` to transform the dialogues into turns as samples. By passing different arguments to `load_unified_data`, we provide functions to load data for different components:
+
+```python
+from convlab2.util import load_unified_data, load_nlu_data, load_dst_data, load_policy_data, load_nlg_data, load_e2e_data
+
+nlu_data = load_nlu_data(dataset, data_split='test', speaker='user')
+dst_data = load_dst_data(dataset, data_split='test', speaker='user', context_window_size=5)
+```
+
+To customize the data loading process, see the definition of `load_unified_data`.
+
+## Unified datasets
 Each dataset contains at least these files:
 
 - `README.md`: dataset description and the **main changes** from original data to processed data. Should include the instruction on how to get the original data and transform them into the unified format.
-- 
GitLab