diff --git a/data/unified_datasets/README.md b/data/unified_datasets/README.md
index a22a057e191e4bc2b75c0290e2d9cc8a09c23ffc..2b4d3b3fdd88fed88273677868df6f47a1c7bc04 100644
--- a/data/unified_datasets/README.md
+++ b/data/unified_datasets/README.md
@@ -1,13 +1,16 @@
-# Unified data format with example
+# Unified data format
 
-Under `data/unified_datasets` directory.
-
-single turn->dialogue with one turn
+## Overview
+We transform different datasets into a unified format under `data/unified_datasets` directory. One could also access processed datasets from Hugging Face's `Dataset`:
+```python
+from datasets import load_dataset
+dataset = load_dataset('ConvLab/$dataset')
+```
 
-Each dataset have at least 4 files:
+Each dataset contains at least these files:
 
-- `README.md`: dataset description and the main changes from original data to processed data.
-- `preprocess.py`: python script that preprocess the data. By running `python preprocess.py` we can get the following two files. The structure `preprocess.py`  should be:
+- `README.md`: dataset description and the **main changes** from original data to processed data. Should include the instruction on how to get the original data and transform them into the unified format.
+- `preprocess.py`: python script that transform the original data into the unified format. By running `python preprocess.py` we can get `data.zip` that contains all data. The structure `preprocess.py` should be like:
 
 ```python
 def preprocess():
@@ -16,16 +19,28 @@ if __name__ == '__main__':
     preprocess()
 ```
 
-- `ontology.json`: dataset ontology, contains descriptions, state definition, etc.
-- `data.json.zip`: contains `data.json`.
+- `data.zip`: (also available in https://huggingface.co/ConvLab) the zipped directory contains:
+  - `ontology.json`: dataset ontology, contains descriptions, state definition, etc.
+  - `dialogues.json`, a list of all dialogues in the dataset.
+  - other necessary files such as databases.
 
-### README
+Datasets that require database interaction should also include the following file:
+- `database.py`: load the database and define the query function:
+```python
+class Database:
+    def __init__(self):
+        """extract data.zip and load the database."""
 
-- Data source: publication, original data download link, etc.
-- Data description:
-  - Annotations: whether have dialogue act, belief state annotation.
-  - Statistics: \# domains, # dialogues, \# utterances, Avg. turns, Avg. tokens (split by space), etc.
-- Main changes from original data to processed data.
+    def query(self, domain:str, state:dict, topk:int, **kwargs)->list:
+        """return a list of topk entities (dict containing slot-value pairs) for a given domain based on the dialogue state."""
+```
+
+## Unified format
+We first introduce the unified format of `ontology` and `dialogues`. To transform a new dataset into the unified format:
+1. Create `data/unified_datasets/$dataset` folder, where `$dataset` is the name of the dataset.
+2. Write `preprocess.py` to transform the original dataset into the unified format, producing `data.zip`.
+3. Run `python test.py $dataset` in the `data/unified_datasets` directory to check the validation of processed dataset and get data statistics.
+4. Write `README.md` to describe the data.
 
 ### Ontology
 
@@ -43,519 +58,50 @@ if __name__ == '__main__':
 - `intents`: (*dict*) descriptions for intents.
   - `$intent_name`: (*dict*)
     - `description`: (*str*) description for this intent.
-- `binary_dialogue_act`: (*list* of *dict*) special dialogue acts that the value may not present in the utterance, e.g. request the address of a hotel.
-  - `{"intent": (str), "domain": (str), "slot": (str), "value": (str)}`. domain, slot, value may be empty.
-- `state`: (*dict*) belief state of all domains.
+- `binary_dialogue_acts`: (*list* of *dict*) binary dialogue act is a more detailed intent where the value is not extracted from dialogues, e.g. request the address of a hotel.
+  - `{"intent": (str), "domain": (str), "slot": (str), "value": (str)}`. domain, slot, and value may be empty.
+- `state`: (*dict*) dialogue state of all domains.
   - `$domain_name`: (*dict*)
     - `$slot_name: ""`: slot with empty value. Note that the slot set are the subset of the slot set in Part 1 definition.
 
 ### Dialogues
 
-`data.json`: a *list* of dialogues containing:
+`data.json`: a *list* of dialogues (*dict*) containing:
 
-- `dataset`: (*str*) dataset name, must be one of  ['schema', 'multiwoz', 'camrest', 'woz', ...], and be the same as the current dataset.
-- `data_split`: (*str*) in [train, val, test].
-- `dialogue_id`: (*str*) use dataset name as prefix, add count.
-- `domains`: (*list*) domains in this dialogue.
+- `dataset`: (*str*) dataset name, must be the same as the data directory.
+- `data_split`: (*str*) in `["train", "validation", "test"]`.
+- `dialogue_id`: (*str*) `"$dataset-$split-$id"`, `id` increases from 0.
+- `domains`: (*list*) involved domains in this dialogue.
+- `goal`: (*dict*, optional)
+  - `description`: (*str*) a string describes the user goal.
+  - `constraints`: (*dict*, optional) same format as dialogue state of involved domains but with only filled slots as constraints.
+  - `requirements`: (*dict*, optional) same format as dialogue state of involved domains but with only empty required slots.
 - `turns`: (*list* of *dict*)
-  - `speaker`: (*str*) "user" or "system". **User side first, user side final**, "user" and "system" appear alternately?
-  - `utterance`: (*str*) sentence.
+  - `speaker`: (*str*) "user" or "system".
+  - `utterance`: (*str*)
   - `utt_idx`: (*int*) `turns['utt_idx']` gives current turn.
-  - `dialogue_act`: (*dict*)
+  - `dialogue_acts`: (*dict*, optional)
     - `categorical`: (*list* of *dict*) for categorical slots.
       - `{"intent": (str), "domain": (str), "slot": (str), "value": (str)}`. Value sets are defined in the ontology.
     - `non-categorical` (*list* of *dict*) for non-categorical slots.
-      - `{"intent": (str), "domain": (str), "slot": (str), "value": (str), "start": (int), "end": (int)}`. `start` and `end` are character indexes for the value span.
+      - `{"intent": (str), "domain": (str), "slot": (str), "value": (str), "start": (int), "end": (int)}`. `start` and `end` are character indexes for the value span in the utterance and can be absent.
     - `binary` (*list* of *dict*) for binary dialogue acts in ontology.
-      - `{"intent": (str), "domain": (str), "slot": (str), "value": (str)}`. Possible dialogue acts are listed in the `ontology['binary_dialogue_act']`.
-  - `state`: (*dict*, optional, user side) full state are shown in `ontology['state']`.
+      - `{"intent": (str), "domain": (str), "slot": (str), "value": (str)}`. Possible dialogue acts are listed in the `ontology['binary_dialogue_acts']`.
+  - `state`: (*dict*, user side, optional) dialogue state of involved domains. full state is shown in `ontology['state']`.
     - `$domain_name`: (*dict*) contains all slots in this domain.
       - `$slot_name`: (*str*) value for this slot.
-  - `state_update`: (*dict*, optional, user side) records the difference of states between the current turn and the last turn.
-    - `categorical`: (*list* of *dict*) for categorical slots.
-      - `{"domain": (str), "slot": (str), "value": (str)}`. Value sets are defined in the ontology (**dontcare** may not be included).
-    - `non-categorical` (*list* of *dict*) for non-categorical slots.
-      - `{"domain": (str), "slot": (str), "value": (str), "utt_idx": (int), "start": (int), "end": (int)}`. `utt_idx` is the utterance index of the value. `start` and `end` are character indexes for the value span in the current turn. `turn[utt_idx]['utterance'][start:end]` gives the value.
+  - `db_results`: (*dict*, optional)
+    - `$domain_name`: (*list* of *dict*) topk entities (each entity contains slot-value pairs)
 
 Other attributes are optional.
 
-Run `python evaluate.py $dataset` to check the validation of processed dataset.
+Run `python test.py $dataset` in the `data/unified_datasets` directory to check the validation of processed dataset and get data statistics.
 
-## Example of Schema Dataset
+### README
+Each dataset has a README.md to describe the original and transformed data. Follow the Hugging Face's [dataset card creation](https://huggingface.co/docs/datasets/dataset_card.html) to export `README.md`. Make sure that the following additional information is included in the **Dataset Summary** section:
+- Main changes from original data to processed data.
+- Annotations: whether have user goal, dialogue acts, state, db results, etc.
 
-```json
-	{
-    "dataset": "schema",
-    "data_split": "train",
-    "dialogue_id": "schema_535",
-    "original_id": "5_00022",
-    "domains": [
-      "event_2"
-    ],
-    "turns": [
-      {
-        "speaker": "user",
-        "utterance": "I feel like going out to do something in Oakland. I've heard the Raiders Vs Bengals game should be good.",
-        "utt_idx": 0,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "inform_intent",
-              "domain": "event_2",
-              "slot": "intent",
-              "value": "geteventdates"
-            }
-          ],
-          "categorical": [],
-          "non-categorical": [
-            {
-              "intent": "inform",
-              "domain": "event_2",
-              "slot": "event_name",
-              "value": "raiders vs bengals",
-              "start": 65,
-              "end": 83
-            },
-            {
-              "intent": "inform",
-              "domain": "event_2",
-              "slot": "city",
-              "value": "oakland",
-              "start": 41,
-              "end": 48
-            }
-          ]
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "",
-            "time": "",
-            "number_of_tickets": "",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": [
-            {
-              "domain": "event_2",
-              "slot": "city",
-              "value": "oakland",
-              "utt_idx": 0,
-              "start": 41,
-              "end": 48
-            },
-            {
-              "domain": "event_2",
-              "slot": "event_name",
-              "value": "raiders vs bengals",
-              "utt_idx": 0,
-              "start": 65,
-              "end": 83
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "The Raiders Vs Bengals game is at Oakland-Alameda County Coliseum today.",
-        "utt_idx": 1,
-        "dialogue_act": {
-          "binary": [],
-          "categorical": [],
-          "non-categorical": [
-            {
-              "intent": "offer",
-              "domain": "event_2",
-              "slot": "date",
-              "value": "today",
-              "start": 66,
-              "end": 71
-            },
-            {
-              "intent": "offer",
-              "domain": "event_2",
-              "slot": "event_name",
-              "value": "raiders vs bengals",
-              "start": 4,
-              "end": 22
-            },
-            {
-              "intent": "offer",
-              "domain": "event_2",
-              "slot": "venue",
-              "value": "oakland-alameda county coliseum",
-              "start": 34,
-              "end": 65
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "What time does it start?",
-        "utt_idx": 2,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "request",
-              "domain": "event_2",
-              "slot": "time",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "",
-            "time": "",
-            "number_of_tickets": "",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "It starts at 7 pm.",
-        "utt_idx": 3,
-        "dialogue_act": {
-          "binary": [],
-          "categorical": [],
-          "non-categorical": [
-            {
-              "intent": "inform",
-              "domain": "event_2",
-              "slot": "time",
-              "value": "7 pm",
-              "start": 13,
-              "end": 17
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "That sounds fine.",
-        "utt_idx": 4,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "select",
-              "domain": "event_2",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "today",
-            "time": "",
-            "number_of_tickets": "",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": [
-            {
-              "domain": "event_2",
-              "slot": "date",
-              "value": "today",
-              "utt_idx": 1,
-              "start": 66,
-              "end": 71
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "Do you want to get tickets for it?",
-        "utt_idx": 5,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "offer_intent",
-              "domain": "event_2",
-              "slot": "intent",
-              "value": "buyeventtickets"
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "Yes, can you buy 3 tickets for me?",
-        "utt_idx": 6,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "affirm_intent",
-              "domain": "event_2",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [
-            {
-              "intent": "inform",
-              "domain": "event_2",
-              "slot": "number_of_tickets",
-              "value": "3"
-            }
-          ],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "today",
-            "time": "",
-            "number_of_tickets": "3",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [
-            {
-              "domain": "event_2",
-              "slot": "number_of_tickets",
-              "value": "3"
-            }
-          ],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "Sure. I will go ahead and buy 3 tickets for the Raiders Vs Bengals game in Oakland today. Is that right?",
-        "utt_idx": 7,
-        "dialogue_act": {
-          "binary": [],
-          "categorical": [
-            {
-              "intent": "confirm",
-              "domain": "event_2",
-              "slot": "number_of_tickets",
-              "value": "3"
-            }
-          ],
-          "non-categorical": [
-            {
-              "intent": "confirm",
-              "domain": "event_2",
-              "slot": "event_name",
-              "value": "raiders vs bengals",
-              "start": 48,
-              "end": 66
-            },
-            {
-              "intent": "confirm",
-              "domain": "event_2",
-              "slot": "date",
-              "value": "today",
-              "start": 83,
-              "end": 88
-            },
-            {
-              "intent": "confirm",
-              "domain": "event_2",
-              "slot": "city",
-              "value": "oakland",
-              "start": 75,
-              "end": 82
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "Yes, that's good. What's the address?",
-        "utt_idx": 8,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "request",
-              "domain": "event_2",
-              "slot": "venue_address",
-              "value": ""
-            },
-            {
-              "intent": "affirm",
-              "domain": "",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "today",
-            "time": "",
-            "number_of_tickets": "3",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "The game is at 7000 Coliseum Way. I've bought the tickets.",
-        "utt_idx": 9,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "notify_success",
-              "domain": "event_2",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": [
-            {
-              "intent": "inform",
-              "domain": "event_2",
-              "slot": "venue_address",
-              "value": "7000 coliseum way",
-              "start": 15,
-              "end": 32
-            }
-          ]
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "Thanks! That's all.",
-        "utt_idx": 10,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "thank_you",
-              "domain": "",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "today",
-            "time": "",
-            "number_of_tickets": "3",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "system",
-        "utterance": "Need help with anything else?",
-        "utt_idx": 11,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "req_more",
-              "domain": "",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        }
-      },
-      {
-        "speaker": "user",
-        "utterance": "No, thank you.",
-        "utt_idx": 12,
-        "dialogue_act": {
-          "binary": [
-            {
-              "intent": "negate",
-              "domain": "",
-              "slot": "",
-              "value": ""
-            },
-            {
-              "intent": "thank_you",
-              "domain": "",
-              "slot": "",
-              "value": ""
-            }
-          ],
-          "categorical": [],
-          "non-categorical": []
-        },
-        "state": {
-          "event_2": {
-            "event_type": "",
-            "category": "",
-            "event_name": "raiders vs bengals",
-            "date": "today",
-            "time": "",
-            "number_of_tickets": "3",
-            "city": "oakland",
-            "venue": "",
-            "venue_address": ""
-          }
-        },
-        "state_update": {
-          "categorical": [],
-          "non-categorical": []
-        }
-      }
-    ]
-  }
-```
+And the data statistics given by `test.py` should be included in the **Data Splits** section.
 
+## Example dialogue of Schema-Guided Dataset