Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TransformerLayer, TransformerBlock, C3TR modules #2333

Merged
merged 18 commits into from
Apr 1, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Solve the problem of MA with DDP
  • Loading branch information
dingyiwei committed Mar 31, 2021
commit e6e5f0ea1704f66c883563fd0e5c7939851cd708
5 changes: 4 additions & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,10 @@ def train(hyp, opt, device, tb_writer=None):

# DDP mode
if cuda and rank != -1:
model = DDP(model, device_ids=[opt.local_rank], output_device=opt.local_rank)
# `find_unused_parameters=True` should be passed for the incompatibility of nn.MultiheadAttention with DDP,
# according to https://github.com/pytorch/pytorch/issues/26698
find_unused_params = False if not [type(layer) for layer in model.modules() if isinstance(layer, nn.MultiheadAttention)] else True
model = DDP(model, device_ids=[opt.local_rank], output_device=opt.local_rank, find_unused_parameters=find_unused_params)

# Model parameters
hyp['box'] *= 3. / nl # scale to layers
Expand Down