Dynamic Dialogue Policy for Continual Reinforcement Learning
This is the code base to the paper Dynamic Dialogue Policy for Continual Reinforcement Learning (https://arxiv.org/abs/2204.05928)
The code is adapted and extended from ConvLab-2 (https://github.com/thu-coai/ConvLab-2)
Installation
Require python 3.6.
Run the start_up.sh script to create a virtual environment and install requirements:
cd convlab-2
bash start_up.sh
source venv/bin/activate
Run Continual Reinforcement Learning experiments
We provide three different models for running experiments: DDPT, MLP (Bin), semantic (Sem). These can be found in the folder /convlab2/policy/ under vtrace_DPT, vtrace_MLP and vtrace_semantic.
In each model folder you will find scripts for running trainings in model_folder/run_scripts
.
By using them, you can train models on three different domain orders or with the transformer-based user simulator (TUS).
You can of course adapt them to run different trainings. Each folder also contains a config.json file, where you can speficy continual learning parameters such as the online-offline-ratio.
You can also directly execute a continual learning training, for instance, by running
python convlab2/policy/vtrace_DPT/train_continually.py --use_masking
to start a DDPT training directly on the mixed order.
Once a training is started, it will create an experiments folder experiment_TIMESTAMP in the model folder with all necessary information. This folder will be moved to model_folder/finished_experiments after training is done.
Evaluation
Evaluation is done through executing the script
python plot_continual_learning/plot_cl.py model1 model2 model3 --dir=experiment_folder
As an example of the folder structure, have a look at plot_continual_learning/easy2hard_order_experiments, which already contains experiment folders. In the example, where you would like to compare Bin, DDPT and Sem, you would execute
python plot_continual_learning/plot_cl.py Bin Sem DDPT --dir=plot_continual_learning/easy2hard_order_experiments
This will create a folder cl_plots inside easy2hard_order_experiments, where you can find all plotted results. In addition, the model folders Bin, DDPT and Sem will have excel-sheets with information regarding forward transfer and forgetting.