Code for "Dialogue Evaluation with Offline Reinforcement Learning" paper.
<palign="center">
<imgwidth="700"src="all2.pdf">
<imgwidth="700"src="all2.png">
</p>
In this paper, we propose the use of offline reinforcement learning for dialogue evaluation based on static data.Such an evaluator is typically called a critic and utilized for policy optimization. We go one step further and show that offline RL critics can be trained for any dialogue system as external evaluators, allowing dialogue performance comparisons across various types of systems. This approach has the benefit of being corpus- and model-independent, while attaining strong correlation with human judgements, which we confirm via an interactive user trial.