modeling_utils.py

3 years ago

aaa123git authored 3 years ago

* use `GPT2TokenizerFast` by default
* fix cache
* fix padding: pad `eos_id` instead of `0`
* preprocess_batch: remove redundant padding token and make sequence length be multiple of 8
* use native `amp` instead of `apex`
* fix bug during evaluation when `args.n_gpu > 1`
* support `torch.optim._multi_tensor.AdamW`
* support `gradient_checkpointing`

758860c5

History

update train.py

aaa123git authored 3 years ago

* use `GPT2TokenizerFast` by default
* fix cache
* fix padding: pad `eos_id` instead of `0`
* preprocess_batch: remove redundant padding token and make sequence length be multiple of 8
* use native `amp` instead of `apex`
* fix bug during evaluation when `args.n_gpu > 1`
* support `torch.optim._multi_tensor.AdamW`
* support `gradient_checkpointing`

Code owners

Assign users and groups as approvers for specific file changes. Learn more.