Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

ddpt-public

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Christian authored
    d3874c23
    History

    Dynamic Dialogue Policy for Continual Reinforcement Learning

    This is the code base to the paper Dynamic Dialogue Policy for Continual Reinforcement Learning (https://arxiv.org/abs/2204.05928)

    The code is adapted and extended from ConvLab-2 (https://github.com/thu-coai/ConvLab-2)

    Installation

    Require python 3.6.

    Run the start_up.sh script to create a virtual environment and install requirements:

    cd convlab-2
    bash start_up.sh
    source venv/bin/activate

    Run Continual Reinforcement Learning experiments

    We provide three different models for running experiments: DDPT, MLP (Bin), semantic (Sem). These can be found in the folder /convlab2/policy/ under vtrace_DPT, vtrace_MLP and vtrace_semantic.

    In each model folder you will find scripts for running trainings in model_folder/run_scripts. By using them, you can train models on three different domain orders or with the transformer-based user simulator (TUS). You can of course adapt them to run different trainings. Each folder also contains a config.json file, where you can speficy continual learning parameters such as the online-offline-ratio. You can also directly execute a continual learning training, for instance, by running

    python convlab2/policy/vtrace_DPT/train_continually.py --use_masking

    to start a DDPT training directly on the mixed order.

    Once a training is started, it will create an experiments folder experiment_TIMESTAMP in the model folder with all necessary information. This folder will be moved to model_folder/finished_experiments after training is done.

    Evaluation

    Evaluation is done through executing the script

    python plot_continual_learning/plot_cl.py model1 model2 model3 --dir=experiment_folder

    As an example of the folder structure, have a look at plot_continual_learning/easy2hard_order_experiments, which already contains experiment folders. In the example, where you would like to compare Bin, DDPT and Sem, you would execute

    python plot_continual_learning/plot_cl.py Bin Sem DDPT --dir=plot_continual_learning/easy2hard_order_experiments

    This will create a folder cl_plots inside easy2hard_order_experiments, where you can find all plotted results. In addition, the model folders Bin, DDPT and Sem will have excel-sheets with information regarding forward transfer and forgetting.