Skip to content

Commit

Permalink
sync BN
Browse files Browse the repository at this point in the history
  • Loading branch information
zhanghang1989 committed Apr 13, 2018
1 parent d40adbc commit 25985c3
Show file tree
Hide file tree
Showing 35 changed files with 1,160 additions and 2,152 deletions.
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
ROOTDIR = $(CURDIR)

lint: cpplint pylint

cpplint:
tests/lint.py encoding cpp src kernel

pylint:
pylint --rcfile=$(ROOTDIR)/tests/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" encoding --ignore=_ext
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ created by [Hang Zhang](http://hangzh.com/)
## Citations

**Context Encoding for Semantic Segmentation**
[Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/)
[Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/) [[arXiv]](https://arxiv.org/pdf/1803.08904.pdf)
```
@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@

# General information about the project.
project = 'Encoding'
copyright = '2017, Hang Zhang'
copyright = '2018, Hang Zhang'
author = 'Hang Zhang'

# The version info for the project you're documenting, acts as replacement for
Expand Down
11 changes: 1 addition & 10 deletions docs/source/dilated.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,8 @@ All provided models have been verified.
.. note::
This code is provided together with the paper

* Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation" *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
* Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation" *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*

@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

.. automodule:: encoding.dilated
.. currentmodule:: encoding.dilated
Expand Down Expand Up @@ -91,5 +84,3 @@ DenseNet
~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: densenet201


25 changes: 7 additions & 18 deletions docs/source/encoding.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
.. role:: hidden
:class: hidden-section

My NN Layers
============
NN Layers
=========


Modules
-------
Customized NN modules in Encoding Package. For Synchronized Cross-GPU Batch Normalization, please visit :class:`encoding.nn.BatchNorm2d`.

.. currentmodule:: encoding.nn

Expand Down Expand Up @@ -34,17 +32,8 @@ Modules
.. autoclass:: DilatedAvgPool2d
:members:

Functions
---------

.. currentmodule:: encoding.functions

:hidden:`aggregate`
~~~~~~~~~~~~~~~~~~~

.. autofunction:: aggregate
:hidden:`GramMatrix`
~~~~~~~~~~~~~~~~~~~~

:hidden:`dilatedavgpool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: dilatedavgpool2d
.. autoclass:: GramMatrix
:members:
53 changes: 13 additions & 40 deletions docs/source/functions.rst
Original file line number Diff line number Diff line change
@@ -1,58 +1,31 @@
.. role:: hidden
:class: hidden-section

Other Functions
===============
encoding.functions
==================

.. automodule:: encoding.functions

.. currentmodule:: encoding.functions


:hidden:`scaledL2`
~~~~~~~~~~~~~~~~~~~

.. autofunction:: scaledL2


:hidden:`upsample`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: upsample


:hidden:`dropout`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: dropout


:hidden:`relu`
:hidden:`dilatedavgpool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: relu


:hidden:`view_each`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: view_each


:hidden:`multi_each`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: dilatedavgpool2d

.. autofunction:: multi_each
:hidden:`aggregate`
~~~~~~~~~~~~~~~~~~~

.. autofunction:: aggregate

:hidden:`sum_each`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: sum_each
:hidden:`scaledL2`
~~~~~~~~~~~~~~~~~~~

.. autofunction:: scaledL2

:hidden:`cat_each`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: cat_each
:hidden:`sum_square`
~~~~~~~~~~~~~~~~~~~~

.. autofunction:: sum_square
5 changes: 2 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Created by `Hang Zhang <http://hangzh.com/>`_

An optimized PyTorch package with CUDA backend.

.. todo::
A PyTorch DataParallel compatible Synchronized Cross-GPU Batch Normalization will be provided soon.
.. note::
PyTorch compatible Synchronized Cross-GPU :class:`encoding.nn.BatchNorm2d` has been released.

.. toctree::
:glob:
Expand All @@ -34,7 +34,6 @@ An optimized PyTorch package with CUDA backend.
syncbn
parallel
dilated
nn
functions
utils

Expand Down
95 changes: 0 additions & 95 deletions docs/source/nn.rst

This file was deleted.

4 changes: 3 additions & 1 deletion docs/source/notes/compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ Install from Source

- On Linux::

pip install -r requirements.txt
python setup.py install

- On Mac OSX::

MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
pip install -r requirements.txt
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

Citations
---------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notes/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ which is extending :mod:`torch.nn` and
Torch C and CUDA Backend
------------------------

Given an example of the residual operation (in a mini-batch):
Given a simple example of the residual operation (in a mini-batch):

.. math::
r_{ik} = x_i - c_k
Expand Down
48 changes: 17 additions & 31 deletions docs/source/notes/syncbn.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Implementing Synchronized Multi-GPU Batch Normalization
=======================================================

In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`). We will provide the training example in a later version.
In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d`. We will provide the training example in a later version.

How BN works?
-------------
Expand All @@ -17,13 +17,13 @@ BN layer was introduced in the paper `Batch Normalization: Accelerating Deep Net
where :math:`\mu=\frac{\sum_i^N x_i}{N} , \sigma = \sqrt{\frac{\sum_i^N (x_i-\mu)^2}{N}+\epsilon}` and :math:`\gamma, \beta` are the learnable parameters.

- Backward Pass:
For calculating the gradient :math:`\frac{d_\ell}{d_{x_i}}`, we need to consider the gradient from :math:`\frac{d_\ell}{d_y}` and the gradients from :math:`\frac{d_\ell}{d_\mu}` and :math:`\frac{d_\ell}{d_\sigma}`, since the :math:`\mu \text{ and } \sigma` are the function of the input :math:`x_i`. We use patial direvative in the notations:
For calculating the gradient :math:`\frac{d_\ell}{d_{x_i}}`, we need to consider the partial gradient from :math:`\frac{d_\ell}{d_y}` and the gradients from :math:`\frac{d_\ell}{d_\mu}` and :math:`\frac{d_\ell}{d_\sigma}`, since the :math:`\mu \text{ and } \sigma` are the function of the input :math:`x_i`. We use patial direvative in the notations:

.. math::
\frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{d_{y_i}}{d_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
\frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{\partial_{y_i}}{\partial_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
where :math:`\frac{d_{y_i}}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
where :math:`\frac{\partial_{y_i}}{\partial_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
\text{ and } \frac{d_\sigma}{d_{x_i}}=-\frac{1}{\sigma}(\frac{x_i-\mu}{N})`.

Why Synchronize BN?
Expand All @@ -41,41 +41,27 @@ How to Synchronize?
Suppose we have :math:`K` number of GPUs, :math:`sum(x)_k` and :math:`sum(x^2)_k` denotes the sum of elements and sum of element squares in :math:`k^{th}` GPU.

- Forward Pass:
We can calculate the sum of elements :math:`sum(x)=\sum x_i \text{ and sum of squares } sum(x^2)=\sum x_i^2` in each GPU, then apply :class:`encoding.parallel.AllReduce` operation to sum accross GPUs. Then calculate the global mean :math:`\mu=\frac{sum(x)}{N} \text{ and global variance } \sigma=\sqrt{\frac{sum(x^2)}{N}-\mu^2+\epsilon}`.
We can calculate the sum of elements :math:`sum(x)=\sum x_i \text{ and sum of squares } sum(x^2)=\sum x_i^2` in each GPU, then apply :class:`encoding.parallel.allreduce` operation to sum accross GPUs. Then calculate the global mean :math:`\mu=\frac{sum(x)}{N} \text{ and global variance } \sigma=\sqrt{\frac{sum(x^2)}{N}-\mu^2+\epsilon}`.

- Backward Pass:
* :math:`\frac{d_\ell}{d_{x_i}}=\frac{\gamma}{\sigma}` can be calculated locally in each GPU.
* :math:`\frac{d_\ell}{d_{x_i}}=\frac{d_\ell}{d_{y_i}}\frac{\gamma}{\sigma}` can be calculated locally in each GPU.
* Calculate the gradient of :math:`sum(x)` and :math:`sum(x^2)` individually in each GPU :math:`\frac{d_\ell}{d_{sum(x)_k}}` and :math:`\frac{d_\ell}{d_{sum(x^2)_k}}`.

* Then Sync the gradient (automatically handled by :class:`encoding.parallel.AllReduce`) and continue the backward.

Classic Implementation
~~~~~~~~~~~~~~~~~~~~~~

- Synchronized DataParallel:
Standard DataParallel pipeline of public frameworks (MXNet, PyTorch...) in each training iters:

* duplicate the network (weights) to all the GPUs,
* split the training batch to each GPU,
* forward and backward to calculate gradient,
* update network parameters (weights) then go to next iter.

Therefore, communicattion accross different GPUs are not supported. To address this problem, we introduce a :class:`encoding.parallel.SelfDataParallel` mode, which enables each layer to accept mutli-GPU inputs directly. Those self-parallel layers are provide in :class:`encoding.nn`.

- Cross GPU Autograd:
Due to the BN layers are frequently used in the networks, the PyTorch autograd engine will be messed up by such a complicated backward graph. To address this problem, we provide an aotograd function :class:`encoding.parallel.AllReduce` to handle the cross GPU gradient calculation.

Comparing Performance
---------------------

- Training Time:

- Segmentation Performance:
* Then Sync the gradient (automatically handled by :class:`encoding.parallel.allreduce`) and continue the backward.


Citation
--------

.. note::
This code is provided together with the paper, please cite our work.

* Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation" *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::

This code is provided together with the paper (coming soon), please cite our work.
@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
Loading

0 comments on commit 25985c3

Please sign in to comment.