The problem of KL divergence being inf in the late stage of training #4

Hellod035 · 2024-09-14T04:28:55Z

Steps to reproduce:
Increase max_epochs in skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml
and run

python skillmimic/run.py --task HRLScoringLayup --cfg_env skillmimic/data/cfg/skillmimic_hlc.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml \
--motion_file skillmimic/data/motions/BallPlay-M/run \
--llc_checkpoint skillmimic/data/models/mixedskills/nn/skillmimic_llc.pth \
--resume_from skillmimic/data/models/hlc_scoring/nn/SkillMimic.pth \
--headless

then you will see "NaN or Inf found in input tensor" in terminal, it actually because of some of the KL divergence being inf.
I would like to ask if this phenomenon has been noticed, whether this is allowed or whether the hyperparameters need further adjustment.

The text was updated successfully, but these errors were encountered:

wyhuai · 2024-09-15T04:47:36Z

Hi, this does occur during the training of the high-level policy, but it currently doesn't seem to affect the results. We plan to address this issue later, so for now, you can consider it acceptable.

Hellod035 · 2024-09-17T04:05:29Z

Thank you very much for your reply :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem of KL divergence being inf in the late stage of training #4

The problem of KL divergence being inf in the late stage of training #4

Hellod035 commented Sep 14, 2024

wyhuai commented Sep 15, 2024

Hellod035 commented Sep 17, 2024

The problem of KL divergence being inf in the late stage of training #4

The problem of KL divergence being inf in the late stage of training #4

Comments

Hellod035 commented Sep 14, 2024

wyhuai commented Sep 15, 2024

Hellod035 commented Sep 17, 2024