Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of KL divergence being inf in the late stage of training #4

Open
Hellod035 opened this issue Sep 14, 2024 · 2 comments
Open

Comments

@Hellod035
Copy link

Steps to reproduce:
Increase max_epochs in skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml
and run

python skillmimic/run.py --task HRLScoringLayup --cfg_env skillmimic/data/cfg/skillmimic_hlc.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml \
--motion_file skillmimic/data/motions/BallPlay-M/run \
--llc_checkpoint skillmimic/data/models/mixedskills/nn/skillmimic_llc.pth \
--resume_from skillmimic/data/models/hlc_scoring/nn/SkillMimic.pth \
--headless

then you will see "NaN or Inf found in input tensor" in terminal, it actually because of some of the KL divergence being inf.
I would like to ask if this phenomenon has been noticed, whether this is allowed or whether the hyperparameters need further adjustment.

@wyhuai
Copy link
Owner

wyhuai commented Sep 15, 2024

Hi, this does occur during the training of the high-level policy, but it currently doesn't seem to affect the results. We plan to address this issue later, so for now, you can consider it acceptable.

@Hellod035
Copy link
Author

Thank you very much for your reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants