A new activation function ACON that is very simple and effective !! #2891

nmaac · 2021-04-22T08:16:40Z

🚀 Feature

There is a new activation function ACON (CVPR 2021) that unifies ReLU and Swish.
ACON is simple but very effective, code is here: https://github.com/nmaac/acon/blob/main/acon.py#L19

The improvements are very significant:

Motivation

Pitch

I would like to suggest replacing SiLU with ACON directly because SiLU (Swish) is used in your project, its general and effective form ACON may also show improvements.

Alternatives

It also has an enhanced version meta-ACON that uses a small network to learn beta explicitly, which may influence the speed a bit.

Additional context

Code and paper.

github-actions · 2021-04-22T08:17:35Z

👋 Hello @nmaac, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2021-04-22T08:56:55Z

@nmaac thanks for the idea, looks promising! Any object detection results so far?

nmaac · 2021-04-22T09:19:18Z

There are some detection results:

I did not test it on yolov5, but seems it has the potential to make nearly cost-free improvements, by simply replacing SiLU.

glenn-jocher · 2021-04-22T10:05:53Z

@nmaac ah great, thank you! Yes this is quite a significant improvement in your Table 9. Which ACON version would you recommend we try, and what values for p1, p2, Beta?

meta-ACON
ACON-A
ACON-B
ACON-C

The right place to include a new activation would be utils/activations, and then the place to swap out nn.SiLU() for a new activation is here on L39 of models/common.py

yolov5/models/common.py

Lines 33 to 43 in d48a34d

    
           class Conv(nn.Module): 
        
               # Standard convolution 
        
               def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups 
        
                   super(Conv, self).__init__() 
        
                   self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) 
        
                   self.bn = nn.BatchNorm2d(c2) 
        
                   self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
               def forward(self, x): 
        
                   return self.act(self.bn(self.conv(x)))

nmaac · 2021-04-22T10:24:45Z

I would like to suggest ACON-C, which improves accuracy without a negligible overhead.

You can use the code directly:

https://github.com/nmaac/acon/blob/8782b65f5d7b3523f656beceb586b54d04019705/acon.py#L4-19

glenn-jocher · 2021-04-22T16:28:06Z

@nmaac @ilem777 I've added AconC to our activations study here:
https://wandb.ai/glenn-jocher/activations

I just started runs with AconC(), MetaAconC() and FReLU(), you can track their progress live at the link above. Training time will be about 3 days. I tried MetaAconC but ran into issues. The nn.batchnorm2d(16) layers produced errors on inputs of size (1,16,1,1), perhaps I implemented the function incorrectly.

glenn-jocher · 2021-04-22T21:52:18Z

@AyushExel I spotted something concerning I was hoping you could look at. When runs are public, like the activation study above, the 'stop run' button appears to work even when the visitor is incognito / no signin.

AyushExel · 2021-04-22T22:00:34Z

@glenn-jocher thanks for reporting this. I'll check if the button for non-authorized users actually stops the runs or not. If it does then it's a very bad bug otherwise it's just a minor frontend bug. I'll file a ticket for this to get fixed

WongKinYiu · 2021-04-23T01:06:24Z

it because nn.batchnorm2d need batch size > 1 when training.
the simplest way to solve the problem is change this line to

            m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(2, ch, s, s))])  # forward

nmaac · 2021-04-23T01:17:23Z

@glenn-jocher you can simply remove the two bn
layers in MetaAcon which does not affect the accuracy much.

glenn-jocher · 2021-04-23T09:06:45Z

@nmaac oh, I think I misunderstood before. I think you mean to remove self.bn1 and self.bn2 completely from the MetaAconC() module for all batch-sizes?

glenn-jocher · 2021-04-23T09:09:20Z

@WongKinYiu yes, this is a good solution too, though will make model creation a bit slower for all other models. The nn.batchnorm2d() layers are ok for batch-size 1 inference?

glenn-jocher · 2021-04-23T09:26:43Z

@WongKinYiu @nmaac I'm curious, looking at the ACON implementation have you guys tried simply training with SiLU with Beta? I've never done this before. nn.SiLU() does not allow this but I think I might try testing this using a custom SiLU to see how this affects the results.

WongKinYiu · 2021-04-23T12:56:18Z

nn.batchnorm2d() layers can do batch-size 1 inference.
or instancenorm is another choice.

nmaac · 2021-04-25T03:22:21Z

@glenn-jocher SiLU with beta does not show benefits, in the paper Swish-1 and Swish show comparable results when set beta=1, specifically,
Swish: x*sigmoid(beta*x)
Swish-1(SiLU): x*sigmoid(x)

Therefore meta-ACON uses an explicitly way to learn beta which show the improvements.

glenn-jocher · 2021-04-25T18:50:56Z

@nmaac understood, thanks for explanation. I had to completely remove the BN layers from MetaAconC otherwise instabilities appeared in the training (two 'STOPPED' runs below). Results should be done in about a day, but based on the current trends it doesn't initially seem like I was able to produce better results with either AconC or MetaAconC. The best performing activation in the study by far was FReLU, though this should be taken with a grain of salt as FReLU is really blurring the lines between an activation and a convolution layer. Due to the added parameters and FLOPS I would also assume FReLU would disproportionately improve smaller models like YOLOv5s, with unclear correlation to improving larger models like YOLOv5x6, which may necessitate a second study in the future.
https://wandb.ai/glenn-jocher/activations

nmaac · 2021-04-26T06:24:16Z

@glenn-jocher Yes the curves show comparable results. Which activations did you change? Did you pre-train the backbone? In my experiments I usually change the activations in the backbone and pre-train the backbone on ImageNet first. I can help with the pre-training if needed :)

WongKinYiu · 2021-04-26T06:30:38Z

i think the main reason is that @glenn-jocher forget to add p1, p2, and (beta) into no decay optimized group.
https://github.com/nmaac/acon/blob/8782b65f5d7b3523f656beceb586b54d04019705/ACON/ResNet_ACON/utils.py#L82

glenn-jocher · 2021-04-26T09:11:48Z

@nmaac well that's a good question, should the activation function parameters be exempt from weight decay? We use the following parameter groups to exempt .bias parameters and BatchNorm layers from weight decay, so at the moment only the fc1, fc2 biases are exempt from decay.

yolov5/train.py

Lines 115 to 123 in 1849916

    
           pg0, pg1, pg2 = [], [], []  # optimizer parameter groups 
        
           for k, v in model.named_modules(): 
        
               if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter): 
        
                   pg2.append(v.bias)  # biases 
        
               if isinstance(v, nn.BatchNorm2d): 
        
                   pg0.append(v.weight)  # no decay 
        
               elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter): 
        
                   pg1.append(v.weight)  # apply decay

The activation function implementations are all in utils/activations:

yolov5/utils/activations.py

Lines 58 to 98 in 1849916

    
           # ACON https://arxiv.org/pdf/2009.04759.pdf ---------------------------------------------------------------------------- 
        
           class AconC(nn.Module): 
        
               r""" ACON activation (activate or not). 
        
               AconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is a learnable parameter 
        
               according to "Activate or Not: Learning Customized Activation" <https://arxiv.org/pdf/2009.04759.pdf>. 
        
               """ 
        
               def __init__(self, c1): 
        
                   super().__init__() 
        
                   self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1)) 
        
                   self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1)) 
        
                   self.beta = nn.Parameter(torch.ones(1, c1, 1, 1)) 
        
               def forward(self, x): 
        
                   dpx = (self.p1 - self.p2) * x 
        
                   return dpx * torch.sigmoid(self.beta * dpx) + self.p2 * x 
        
           class MetaAconC(nn.Module): 
        
               r""" ACON activation (activate or not). 
        
               MetaAconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is generated by a small network 
        
               according to "Activate or Not: Learning Customized Activation" <https://arxiv.org/pdf/2009.04759.pdf>. 
        
               """ 
        
               def __init__(self, c1, k=1, s=1, r=16):  # ch_in, kernel, stride, r 
        
                   super().__init__() 
        
                   c2 = max(r, c1 // r) 
        
                   self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1)) 
        
                   self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1)) 
        
                   self.fc1 = nn.Conv2d(c1, c2, k, s, bias=True) 
        
                   self.fc2 = nn.Conv2d(c2, c1, k, s, bias=True) 
        
                   # self.bn1 = nn.BatchNorm2d(c2) 
        
                   # self.bn2 = nn.BatchNorm2d(c1) 
        
               def forward(self, x): 
        
                   y = x.mean(dim=2, keepdims=True).mean(dim=3, keepdims=True) 
        
                   # batch-size 1 bug/instabilities https://github.com/ultralytics/yolov5/issues/2891 
        
                   # beta = torch.sigmoid(self.bn2(self.fc2(self.bn1(self.fc1(y)))))  # bug/unstable 
        
                   beta = torch.sigmoid(self.fc2(self.fc1(y)))  # bug patch BN layers removed 
        
                   dpx = (self.p1 - self.p2) * x 
        
                   return dpx * torch.sigmoid(beta * dpx) + self.p2 * x

The activations_study branch used in this study replaces all activations in the YOLOv5 model by redefining self.act here:

yolov5/models/common.py

Lines 34 to 55 in c9c95fb

    
           class Conv(nn.Module): 
        
               # Standard convolution 
        
               def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups 
        
                   super(Conv, self).__init__() 
        
                   self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) 
        
                   self.bn = nn.BatchNorm2d(c2) 
        
                   # self.act = nn.Identity() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.Tanh() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.Sigmoid() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.LeakyReLU(0.1) if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = Mish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = AconC() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = MetaAconC() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   # self.act = SiLU_beta() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
                   self.act = MetaAconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 
        
               def forward(self, x): 
        
                   return self.act(self.bn(self.conv(x)))

All YOLOv5 models are trained from scratch to 300 epochs using all default settings. Training commands are shown in the W&B link to reproduce (COCO dataset autodownloads):

train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights '' --epochs 300 --img 640 --project activations --name yolov5s-MetaAconC_noBN --device 0

developer0hye · 2021-05-17T00:26:04Z

@glenn-jocher
Is there any progress on this issue?

glenn-jocher · 2021-05-17T10:32:34Z

@developer0hye well I'm not sure. The ACON authors @nmaac didn't answer my question of whether we should exempt some of the ACON parameters from weight decay. The current results are here for all the activations on YOLOv5s: https://wandb.ai/glenn-jocher/activations

nmaac · 2021-05-17T10:46:19Z

@glenn-jocher @developer0hye In my experiments, the weight decay setting does not affect the results very much.

But I suggest try another initialization approach:

self.p1 = nn.Parameter(torch.normal(1, 0.01, size=(1, width, 1, 1)))
self.p2 = nn.Parameter(torch.normal(0, 0.01, size=(1, width, 1, 1)))
self.beta = nn.Parameter(torch.normal(1, 0.01, size=(1, width, 1, 1)))

WongKinYiu · 2021-05-17T11:04:14Z

@nmaac

in my experiments:

original silu ~300 epochs: 51.9% AP
old init ~300 epochs: 50.6% AP
new initial ~300 epochs: 51.5% AP

self.p1 = nn.Parameter(torch.normal(1, 0.01, size=(1, width, 1, 1)))
self.p2 = nn.Parameter(torch.normal(0, 0.01, size=(1, width, 1, 1)))
self.beta = nn.Parameter(torch.normal(1, 0.01, size=(1, width, 1, 1)))

and old initial with decay drops 0.2% AP

glenn-jocher · 2021-05-17T11:50:13Z

@nmaac @WongKinYiu got it, thanks guys!

github-actions · 2021-06-17T00:08:44Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

iumyx2612 · 2022-03-14T02:50:52Z

Any updates on this? How's ACON

nmaac added the enhancement New feature or request label Apr 22, 2021

glenn-jocher linked a pull request Apr 22, 2021 that will close this issue

ACON activation function #2893

Merged

glenn-jocher closed this as completed in #2893 Apr 22, 2021

glenn-jocher reopened this Apr 22, 2021

glenn-jocher linked a pull request Apr 22, 2021 that will close this issue

ACON Activation batch-size 1 bug patch #2901

Merged

glenn-jocher closed this as completed in #2901 Apr 25, 2021

glenn-jocher reopened this Apr 25, 2021

glenn-jocher mentioned this issue Jun 10, 2021

yolov5-lite #3168

Closed

github-actions bot added the Stale label Jun 17, 2021

github-actions bot closed this as completed Jun 22, 2021

glenn-jocher mentioned this issue Nov 8, 2021

How to modify the activation function? #3013

Closed

glenn-jocher mentioned this issue Dec 12, 2021

Add activation field to model.yaml #5959

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new activation function ACON that is very simple and effective !! #2891

A new activation function ACON that is very simple and effective !! #2891

nmaac commented Apr 22, 2021

github-actions bot commented Apr 22, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented Apr 22, 2021

nmaac commented Apr 22, 2021

glenn-jocher commented Apr 22, 2021 •

edited

Loading

nmaac commented Apr 22, 2021 •

edited

Loading

glenn-jocher commented Apr 22, 2021 •

edited

Loading

glenn-jocher commented Apr 22, 2021

AyushExel commented Apr 22, 2021

WongKinYiu commented Apr 23, 2021

nmaac commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

WongKinYiu commented Apr 23, 2021

nmaac commented Apr 25, 2021

glenn-jocher commented Apr 25, 2021

nmaac commented Apr 26, 2021

WongKinYiu commented Apr 26, 2021

glenn-jocher commented Apr 26, 2021 •

edited

Loading

developer0hye commented May 17, 2021

glenn-jocher commented May 17, 2021

nmaac commented May 17, 2021

WongKinYiu commented May 17, 2021

glenn-jocher commented May 17, 2021

github-actions bot commented Jun 17, 2021 •

edited by glenn-jocher

Loading

iumyx2612 commented Mar 14, 2022

A new activation function ACON that is very simple and effective !! #2891

A new activation function ACON that is very simple and effective !! #2891

Comments

nmaac commented Apr 22, 2021

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

github-actions bot commented Apr 22, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented Apr 22, 2021

nmaac commented Apr 22, 2021

glenn-jocher commented Apr 22, 2021 • edited Loading

nmaac commented Apr 22, 2021 • edited Loading

glenn-jocher commented Apr 22, 2021 • edited Loading

glenn-jocher commented Apr 22, 2021

AyushExel commented Apr 22, 2021

WongKinYiu commented Apr 23, 2021

nmaac commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

glenn-jocher commented Apr 23, 2021

WongKinYiu commented Apr 23, 2021

nmaac commented Apr 25, 2021

glenn-jocher commented Apr 25, 2021

nmaac commented Apr 26, 2021

WongKinYiu commented Apr 26, 2021

glenn-jocher commented Apr 26, 2021 • edited Loading

developer0hye commented May 17, 2021

glenn-jocher commented May 17, 2021

nmaac commented May 17, 2021

WongKinYiu commented May 17, 2021

glenn-jocher commented May 17, 2021

github-actions bot commented Jun 17, 2021 • edited by glenn-jocher Loading

iumyx2612 commented Mar 14, 2022

github-actions bot commented Apr 22, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented Apr 22, 2021 •

edited

Loading

nmaac commented Apr 22, 2021 •

edited

Loading

glenn-jocher commented Apr 22, 2021 •

edited

Loading

glenn-jocher commented Apr 26, 2021 •

edited

Loading

github-actions bot commented Jun 17, 2021 •

edited by glenn-jocher

Loading