Skip to content
Snippets Groups Projects
Commit ef5ac22e authored by Nurul Fithria Lubis's avatar Nurul Fithria Lubis
Browse files

Update README.md, all2.pdf

parent 7c256625
No related branches found
No related tags found
No related merge requests found
# LAVA + PLAS Public
Code for "Dialogue Evaluation with Offline Reinforcement Learning" paper
Code for "Dialogue Evaluation with Offline Reinforcement Learning" paper.
## The code will be released on September 2022
<p align="center">
<img width="700" src="all2.pdf">
</p>
In this paper, we propose the use of offline reinforcement learning for dialogue evaluation based on static data.Such an evaluator is typically called a critic and utilized for policy optimization. We go one step further and show that offline RL critics can be trained for any dialogue system as external evaluators, allowing dialogue performance comparisons across various types of systems. This approach has the benefit of being corpus- and model-independent, while attaining strong correlation with human judgements, which we confirm via an interactive user trial.
## Data
data.zip includes the following data:
- Pre-processed MultiWOZ 2.0 and 2.1
- Generated response from AuGPT, HDSA with gold action labels, and HDSA with predicted action labels
## Structure
The implementation of the models, as well as training and evaluation scripts are under **latent_dialog**.
The scripts for running the experiments are under **experiment_woz**. The trained models and evaluation results are under **experiment_woz/sys_config_log_model**.
## Policy Optimization
To use critic for policy optimization, training is done in two steps:
### Step 1: SL pre-training with shared response generation and VAE objectives
python mt_gauss.py
### Step 2: Offline RL in the latent action space
python plas_gauss.py
## Policy Evaluation
To train a critic after-the-fact as evaluator for a fixed policy, first extract responses from the policy for MultiWOZ training, validation, and test sets. The responses should be in the same json format as data/augpt/test-predictions.json.
In this codebase, responses from AuGPT, HDSA with gold action labels, and HDSA with predicted action labels are provided in data.zip.
To run the critic training:
python critic_json.py --infile ../data/augpt/test-predictions.json
Adjust the infile argument accordingly.
Adjust the encoder based on the data used to train the model. We provide pre-trained encoder for MWOZ 2.0 and 2.1
To run critic training for MWOZ human policy:
python critic_mwoz.py
At the end of training, evaluation over the test set is automatically performed. To run this separately:
python run_critic.py
Adjust the list of critics to run inside the script.
Of course this can also be done for other datasets. In this case, adjust the encoder of the critic accordingly.
### Contact
Any questions or bug reports can be sent to lubis@hhu.de
all2.pdf 0 → 100644
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment