Skip to content
Snippets Groups Projects
Unverified Commit b8aecfe6 authored by zhuqi's avatar zhuqi Committed by GitHub
Browse files

Merge pull request #21 from ConvLab/unified_dataset

update unified dataset readme
parents 7b65e3b3 ba27f92d
No related branches found
No related tags found
No related merge requests found
# Unified data format
## Overview
## Usage
We transform different datasets into a unified format under `data/unified_datasets` directory. To import a unified datasets:
```python
......@@ -13,6 +13,18 @@ database = load_database('multiwoz21')
`dataset` is a dict where the keys are data splits and the values are lists of dialogues. `database` is an instance of `Database` class that has a `query` function. The format of dialogue, ontology, and Database are defined below.
We provide a function `load_unified_data` to transform the dialogues into turns as samples. By passing different arguments to `load_unified_data`, we provide functions to load data for different components:
```python
from convlab2.util import load_unified_data, load_nlu_data, load_dst_data, load_policy_data, load_nlg_data, load_e2e_data
nlu_data = load_nlu_data(dataset, data_split='test', speaker='user')
dst_data = load_dst_data(dataset, data_split='test', speaker='user', context_window_size=5)
```
To customize the data loading process, see the definition of `load_unified_data`.
## Unified datasets
Each dataset contains at least these files:
- `README.md`: dataset description and the **main changes** from original data to processed data. Should include the instruction on how to get the original data and transform them into the unified format.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment