Add option for `BatchNorm` to handle batches of size one #5530

Padarn · 2022-09-25T13:09:03Z

The purpose of this PR is to provide a way for BatchNorm to work even when the batch size is one. This comes up when training heterogeneous graphs with rare node types.

This PR addresses #5529.

Padarn · 2022-09-25T13:11:25Z

This is still WIP because I realised my simple test case would fail due to the base behaviour of BatchNorm1D:

        r"""
        Decide whether the mini-batch stats should be used for normalization rather than the buffers.
        Mini-batch stats are used in training mode, and in eval mode when buffers are None.
        """
        if self.training:
            bn_training = True
        else:
            bn_training = (self.running_mean is None) and (self.running_var is None)

I currently think the best way to support this is to update mean and var even when batch is size one. But variance is still zero in this case.

codecov · 2022-09-25T13:12:45Z

Codecov Report

Merging #5530 (2709908) into master (00c3a5d) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 2709908 differs from pull request most recent head 611edb0. Consider uploading reports for the commit 611edb0 to get more accurate results

@@            Coverage Diff             @@
##           master    #5530      +/-   ##
==========================================
+ Coverage   83.67%   83.69%   +0.01%     
==========================================
  Files         346      346              
  Lines       19017    19013       -4     
==========================================
- Hits        15913    15912       -1     
+ Misses       3104     3101       -3

Impacted Files	Coverage Δ
torch_geometric/nn/norm/batch_norm.py	`100.00% <100.00%> (ø)`
torch_geometric/utils/scatter.py	`66.66% <0.00%> (-33.34%)`	⬇️
torch_geometric/sampler/hgt_sampler.py	`100.00% <0.00%> (ø)`
torch_geometric/loader/link_neighbor_loader.py	`100.00% <0.00%> (ø)`
torch_geometric/sampler/utils.py	`80.59% <0.00%> (+0.07%)`	⬆️
torch_geometric/sampler/neighbor_sampler.py	`92.25% <0.00%> (+0.38%)`	⬆️
torch_geometric/nn/dense/linear.py	`83.96% <0.00%> (+0.51%)`	⬆️
torch_geometric/sampler/base.py	`96.77% <0.00%> (+5.86%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

EdisonLeeeee · 2022-09-25T13:42:10Z

Would it make sense if we repeat x twice before inputting the batch norm?

x = torch.randn(1, 16)
x = torch.cat([x, x], dim=0)
bn = torch.nn.BatchNorm1d(16)

x = bn(x)[0].unsqueeze(0) # shape [1, 16]

rusty1s · 2022-09-26T05:13:54Z

torch_geometric/nn/norm/batch_norm.py

@@ -30,18 +30,26 @@ class BatchNorm(torch.nn.Module):
            :obj:`False`, this module does not track such statistics and always
            uses batch statistics in both training and eval modes.
            (default: :obj:`True`)
+        skip_no_batch (bool, optional): If set to :obj:`True`, batches with


I don't think this is a good choice since now we apply no normalization at all. Instead, IMO it is better to use training statistics for normalization.

Thanks for the feedback. My only concern for this is that it may be based on a small subset unless we add to the running mean/variance even when the batch size is one. WDYT?

Can you clarify? What's the problem with using running mean/variance here for normalization here?

I've updated baed on your suggestion.

But to clarify: In the BatchNorm layer, we calculate mean and variance using a running_mean and running_var which are exponentially smoothed. But if we switch to eval mode for the cases where batch size is one, we will exclude those examples from the mean/var calculation.

I also suspect that with small batch sizes the variance/mean calculated this way will not approximate well the population.

Yes, these examples will be excluded from the mean/var calculation, but I guess there is no way around this (it would be the case in your previous implementation as well). I think the longer we train, the more stable the running mean/var will become. Ideally, batches with only a single node for a node type should be not too frequent, so excluding them should not cause too much harm.

but I guess there is no way around this (it would be the case in your previous implementation as well).

Yes agreed. The previous implementation was not better in this respect. It was just simpler and I thought I'd raise the question.

rusty1s · 2022-09-26T08:40:17Z

torch_geometric/nn/norm/batch_norm.py

+            running_mean = self.module.running_mean
+            running_var = self.module.running_var
+            if running_mean is None or running_var is None:
+                self.module.running_var = torch.ones(self.in_channels)


Can this really be None? I assume they will be initialized by PyTorch.

Yes: https://github.com/pytorch/pytorch/blob/9c036aa112b0a8fd9afb824d1fda058e2b66ba1d/torch/nn/modules/batchnorm.py#L68

Its a bit of an odd case, but this combined with: https://github.com/pytorch/pytorch/blob/9c036aa112b0a8fd9afb824d1fda058e2b66ba1d/torch/nn/modules/batchnorm.py#L175 will cause errors

As a side note I suspect this line should be:

if not self.training or not self.track_running_stats

self.module.running_var = torch.ones(self.in_channels)

This seems would cause a device mismatch if the module is already on CUDA.

Good point. I guess the important thing is probably to match the device of x. But to simplify I just added them to init

rusty1s · 2022-09-26T08:40:39Z

torch_geometric/nn/norm/batch_norm.py

+    def __init__(self, in_channels: int, eps: float = 1e-5,
+                 momentum: float = 0.1, affine: bool = True,
+                 track_running_stats: bool = True,
+                 allow_no_batch: bool = False):


Suggested change

allow_no_batch: bool = False):

allow_no_batch: bool = True):

Let's make this True by default?

Personally feel making it False by default might be better, as its a bit of a corner case and doesn't match the pytorch behaviour.

rusty1s · 2022-09-29T13:21:54Z

torch_geometric/nn/norm/batch_norm.py

+            training = self.module.training
+            running_mean = self.module.running_mean
+            running_var = self.module.running_var
+            if running_mean is None or running_var is None:


I am still not sure about this while looking at https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py#L53. I think running_mean is only None in case of track_running_stats=False, in which case we should also error out.

Oh hmm, yes, you're right, the problem only happens if track_running_stats is False.

In our current implementation, if track_running_stats=False we do not raise an exception in training mode. What would be the logic for us to do so in this specific case?

The tests in test/nn/norm/test_batch_norm.py all pass with this implementation, removing this will throw an error in the single combination

track_running_stats=False

batch_size=1

Yes, it is impossible to support track_running_stats=False with batch_size=1. The current implementation does simply not do any normalization which is probably not desired. I would simply error out in this case TBH.

Hmm yeah I guess that makes the most sense to me. Its a bit ambiguous but I haven't got a concrete use case for supporting this either so have updated based on your suggestion.

rusty1s · 2022-09-29T13:22:40Z

torch_geometric/nn/norm/batch_norm.py

@@ -30,18 +30,41 @@ class BatchNorm(torch.nn.Module):
            :obj:`False`, this module does not track such statistics and always
            uses batch statistics in both training and eval modes.
            (default: :obj:`True`)
+        allow_no_batch (bool, optional): If set to :obj:`True`, batches with
+            only a single element will work as though in training mode. That is


Suggested change

only a single element will work as though in training mode. That is

only a single element will work as in evaluation. That is

rusty1s · 2022-09-29T13:23:58Z

torch_geometric/nn/norm/batch_norm.py

+        allow_no_batch (bool, optional): If set to :obj:`True`, batches with
+            only a single element will work as though in training mode. That is
+            the running mean and variance will be used.
+            (default: :obj:`False`)


Suggested change

(default: :obj:`False`)

Requires :obj:`track_running_stats=True`. (default: :obj:`False`)

Hmm just saw this. Sorry I'm not clear on why this is the desired behaviour?

torch_geometric/nn/norm/batch_norm.py

Padarn · 2022-09-29T13:41:49Z

Would it make sense if we repeat x twice before inputting the batch norm?
x = torch.randn(1, 16)
x = torch.cat([x, x], dim=0)
bn = torch.nn.BatchNorm1d(16)

x = bn(x)[0].unsqueeze(0) # shape [1, 16]

I missed this @EdisonLeeeee. What would be the reason to do this?

EdisonLeeeee · 2022-09-29T13:51:15Z

@Padarn I thought this would avoid the error when batch size 1 while keeping running_mean updated in such a case. Just a proposal.

Padarn · 2022-09-29T14:03:35Z

Oh I see. I'm not sure how well this would work because the variance would still be zero for the batch and so you would have a division by zero.

EdisonLeeeee · 2022-09-29T14:10:27Z

Interestingly, it worked without any errors. It seems that PyTorch is able to handle such cases properly except that batch size =1.

Padarn · 2022-09-29T14:16:42Z

Oh interesting. Well yes we could do this instead. I'm not really sure if either is better

…

On Thu, 29 Sep 2022, 10:10 pm Jintang Li, ***@***.***> wrote: Interestingly, it worked without any errors. It seems that PyTorch is able to handle such cases properly except that batch size =1. — Reply to this email directly, view it on GitHub <#5530 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRPN3APYTXNACMPCJ3NU3WAWPN3ANCNFSM6AAAAAAQVCWQNM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- By communicating with Grab Holdings Limited and/or its subsidiaries, associate companies and jointly controlled entities (collectively, “Grab”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information that may be privileged and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email. Please notify Grab immediately if you have received this by mistake and delete this email from your system. Email transmission may not be secure or error-free as any information could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab does not accept liability for any errors or omissions in this email that arise as a result of email transmission. All intellectual property rights in this email and any attachments shall remain vested in Grab, unless otherwise provided by law

torch_geometric/nn/norm/batch_norm.py

* add skip for batch norm * add changelog * device of default mean/var * device of default mean/var * device of default mean/var * device of default mean/var * update naming for new arugment * require track running * Update torch_geometric/nn/norm/batch_norm.py * Update torch_geometric/nn/norm/batch_norm.py * update Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

add skip for batch norm

df194c5

Padarn added feature nn labels Sep 25, 2022

Padarn self-assigned this Sep 25, 2022

rusty1s added the 1 - Priority P1 label Sep 25, 2022

rusty1s reviewed Sep 26, 2022

View reviewed changes

rusty1s changed the title ~~[WIP] Add skip for batch norm~~ [WIP] Add skip option for BatchNorm Sep 26, 2022

add changelog

5398b65

Padarn changed the title ~~[WIP] Add skip option for BatchNorm~~ Add option for BatchNorm to handle batches of size one. Sep 26, 2022

rusty1s reviewed Sep 26, 2022

View reviewed changes

Padarn added 4 commits September 26, 2022 18:49

device of default mean/var

d227042

device of default mean/var

24bbb6f

device of default mean/var

8cd8690

device of default mean/var

7ab163d

rusty1s reviewed Sep 29, 2022

View reviewed changes

update naming for new arugment

2709908

Padarn force-pushed the padarn/batch_norm_singleton branch from 0fa22ad to 2709908 Compare September 29, 2022 13:45

rusty1s changed the title ~~Add option for BatchNorm to handle batches of size one.~~ Add option for BatchNorm to handle batches of size one Sep 29, 2022

require track running

c1a86d9

rusty1s approved these changes Sep 30, 2022

View reviewed changes

torch_geometric/nn/norm/batch_norm.py Outdated Show resolved Hide resolved

Update torch_geometric/nn/norm/batch_norm.py

87e50cf

rusty1s reviewed Sep 30, 2022

View reviewed changes

torch_geometric/nn/norm/batch_norm.py Outdated Show resolved Hide resolved

rusty1s added 3 commits September 29, 2022 22:35

Update torch_geometric/nn/norm/batch_norm.py

39c250d

Merge branch 'master' into padarn/batch_norm_singleton

f86ef20

update

611edb0

rusty1s enabled auto-merge (squash) September 30, 2022 06:02

rusty1s merged commit a49fd34 into master Sep 30, 2022

rusty1s deleted the padarn/batch_norm_singleton branch September 30, 2022 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for `BatchNorm` to handle batches of size one #5530

Add option for `BatchNorm` to handle batches of size one #5530

Padarn commented Sep 25, 2022 •

edited

Loading

Padarn commented Sep 25, 2022

codecov bot commented Sep 25, 2022 •

edited

Loading

EdisonLeeeee commented Sep 25, 2022

rusty1s Sep 26, 2022

Padarn Sep 26, 2022

rusty1s Sep 26, 2022

Padarn Sep 26, 2022

rusty1s Sep 26, 2022

Padarn Sep 26, 2022

rusty1s Sep 26, 2022

Padarn Sep 26, 2022

EdisonLeeeee Sep 26, 2022

Padarn Sep 26, 2022 •

edited

Loading

rusty1s Sep 26, 2022

Padarn Sep 26, 2022

rusty1s Sep 29, 2022

Padarn Sep 29, 2022

Padarn Sep 29, 2022 •

edited

Loading

rusty1s Sep 29, 2022

Padarn Sep 30, 2022

rusty1s Sep 29, 2022

rusty1s Sep 29, 2022

Padarn Sep 29, 2022

Padarn commented Sep 29, 2022

EdisonLeeeee commented Sep 29, 2022

Padarn commented Sep 29, 2022

EdisonLeeeee commented Sep 29, 2022

Padarn commented Sep 29, 2022 via email

	only a single element will work as though in training mode. That is
	only a single element will work as in evaluation. That is

	(default: :obj:`False`)
	Requires :obj:`track_running_stats=True`. (default: :obj:`False`)

Add option for BatchNorm to handle batches of size one #5530

Add option for BatchNorm to handle batches of size one #5530

Conversation

Padarn commented Sep 25, 2022 • edited Loading

Padarn commented Sep 25, 2022

codecov bot commented Sep 25, 2022 • edited Loading

Codecov Report

EdisonLeeeee commented Sep 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Padarn Sep 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Padarn Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Padarn commented Sep 29, 2022

EdisonLeeeee commented Sep 29, 2022

Padarn commented Sep 29, 2022

EdisonLeeeee commented Sep 29, 2022

Padarn commented Sep 29, 2022 via email

Add option for `BatchNorm` to handle batches of size one #5530

Add option for `BatchNorm` to handle batches of size one #5530

Padarn commented Sep 25, 2022 •

edited

Loading

codecov bot commented Sep 25, 2022 •

edited

Loading

Padarn Sep 26, 2022 •

edited

Loading

Padarn Sep 29, 2022 •

edited

Loading