Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

automatically find local open port in distributed training #4696

Merged
10 commits merged into from
Oct 7, 2020
Merged

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Oct 2, 2020

Closes #4666

@epwalsh epwalsh requested a review from dirkgr October 2, 2020 20:47
master_port = distributed_params.pop("master_port", 29500)
if master_addr in ("127.0.0.1", "0.0.0.0", "localhost"):
# If running locally, we can automatically find an open port if one is not specified.
master_port = distributed_params.pop("master_port", common_util.find_open_port())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run find_open_port() before it tries to read the port from the config. I think we should not run port discovery at all if we don't have to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@epwalsh epwalsh requested a review from dirkgr October 7, 2020 17:06
@ghost ghost merged commit ae7cf85 into master Oct 7, 2020
@ghost ghost deleted the auto-find-port branch October 7, 2020 18:14
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When running multiple distributed runs on the same box, AllenNLP says "RuntimeError address already in use”
2 participants