diff --git a/README.md b/README.md index a411859c99ca28cca52a435bcc195f4e46dc6822..9ed8dab6cd10138986f59668dcdcaa0e62003866 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,62 @@ # LAVA + PLAS Public -Code for "Dialogue Evaluation with Offline Reinforcement Learning" paper +Code for "Dialogue Evaluation with Offline Reinforcement Learning" paper. -## The code will be released on September 2022 +<p align="center"> + <img width="700" src="all2.pdf"> +</p> + +In this paper, we propose the use of offline reinforcement learning for dialogue evaluation based on static data.Such an evaluator is typically called a critic and utilized for policy optimization. We go one step further and show that offline RL critics can be trained for any dialogue system as external evaluators, allowing dialogue performance comparisons across various types of systems. This approach has the benefit of being corpus- and model-independent, while attaining strong correlation with human judgements, which we confirm via an interactive user trial. + +## Data + +data.zip includes the following data: +- Pre-processed MultiWOZ 2.0 and 2.1 +- Generated response from AuGPT, HDSA with gold action labels, and HDSA with predicted action labels + +## Structure + +The implementation of the models, as well as training and evaluation scripts are under **latent_dialog**. +The scripts for running the experiments are under **experiment_woz**. The trained models and evaluation results are under **experiment_woz/sys_config_log_model**. + + +## Policy Optimization + +To use critic for policy optimization, training is done in two steps: + +### Step 1: SL pre-training with shared response generation and VAE objectives + + python mt_gauss.py + +### Step 2: Offline RL in the latent action space + + python plas_gauss.py + +## Policy Evaluation + +To train a critic after-the-fact as evaluator for a fixed policy, first extract responses from the policy for MultiWOZ training, validation, and test sets. The responses should be in the same json format as data/augpt/test-predictions.json. +In this codebase, responses from AuGPT, HDSA with gold action labels, and HDSA with predicted action labels are provided in data.zip. + +To run the critic training: + + python critic_json.py --infile ../data/augpt/test-predictions.json + +Adjust the infile argument accordingly. + +Adjust the encoder based on the data used to train the model. We provide pre-trained encoder for MWOZ 2.0 and 2.1 + +To run critic training for MWOZ human policy: + + python critic_mwoz.py + +At the end of training, evaluation over the test set is automatically performed. To run this separately: + + python run_critic.py + +Adjust the list of critics to run inside the script. + +Of course this can also be done for other datasets. In this case, adjust the encoder of the critic accordingly. + +### Contact + +Any questions or bug reports can be sent to lubis@hhu.de diff --git a/all2.pdf b/all2.pdf new file mode 100644 index 0000000000000000000000000000000000000000..895329d65b6fb968c677edf073713dfd88a3df8b Binary files /dev/null and b/all2.pdf differ