sync BN

StevenGrove · Apr 13, 2018 · 25985c3 · 25985c3
1 parent d40adbc
commit 25985c3
Show file tree

Hide file tree

Showing 35 changed files with 1,160 additions and 2,152 deletions.
diff --git a/Makefile b/Makefile
@@ -0,0 +1,9 @@
+ROOTDIR = $(CURDIR)
+
+lint: cpplint pylint
+
+cpplint:
+				tests/lint.py encoding cpp src kernel
+
+pylint:
+				pylint --rcfile=$(ROOTDIR)/tests/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" encoding --ignore=_ext
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ created by [Hang Zhang](http://hangzh.com/)
 ## Citations
 
 **Context Encoding for Semantic Segmentation**  
-  [Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/)
+  [Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/) [[arXiv]](https://arxiv.org/pdf/1803.08904.pdf)
 ```
 @InProceedings{Zhang_2018_CVPR,
 author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -65,7 +65,7 @@
 
 # General information about the project.
 project = 'Encoding'
-copyright = '2017, Hang Zhang'
+copyright = '2018, Hang Zhang'
 author = 'Hang Zhang'
 
 # The version info for the project you're documenting, acts as replacement for

diff --git a/docs/source/dilated.rst b/docs/source/dilated.rst
@@ -11,15 +11,8 @@ All provided models have been verified.
 .. note::
     This code is provided together with the paper
 
-    * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
+    * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*
 
-        @InProceedings{Zhang_2018_CVPR,
-        author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
-        title = {Context Encoding for Semantic Segmentation},
-        booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
-        month = {June},
-        year = {2018}
-        }
 
 .. automodule:: encoding.dilated
 .. currentmodule:: encoding.dilated
@@ -91,5 +84,3 @@ DenseNet
 ~~~~~~~~~~~~~~~~~~~~~
 
 .. autofunction:: densenet201
-
-
diff --git a/docs/source/encoding.rst b/docs/source/encoding.rst
@@ -1,12 +1,10 @@
 .. role:: hidden
     :class: hidden-section
 
-My NN Layers
-============
+NN Layers
+=========
 
-
-Modules
--------
+Customized NN modules in Encoding Package. For Synchronized Cross-GPU Batch Normalization, please visit :class:`encoding.nn.BatchNorm2d`.
 
 .. currentmodule:: encoding.nn
 
@@ -34,17 +32,8 @@ Modules
 .. autoclass:: DilatedAvgPool2d
     :members:
 
-Functions
----------
-
-.. currentmodule:: encoding.functions
-
-:hidden:`aggregate`
-~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: aggregate
+:hidden:`GramMatrix`
+~~~~~~~~~~~~~~~~~~~~
 
-:hidden:`dilatedavgpool2d`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: dilatedavgpool2d
+.. autoclass:: GramMatrix
+    :members:
diff --git a/docs/source/functions.rst b/docs/source/functions.rst
@@ -1,58 +1,31 @@
 .. role:: hidden
     :class: hidden-section
 
-Other Functions
-===============
+encoding.functions
+==================
 
 .. automodule:: encoding.functions
 
 .. currentmodule:: encoding.functions
 
-
-:hidden:`scaledL2`
-~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: scaledL2
-
-
-:hidden:`upsample`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: upsample
-
-
-:hidden:`dropout`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: dropout
-
-
-:hidden:`relu`
+:hidden:`dilatedavgpool2d`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: relu
-
-
-:hidden:`view_each`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: view_each
-
-
-:hidden:`multi_each`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autofunction:: dilatedavgpool2d
 
-.. autofunction:: multi_each
+:hidden:`aggregate`
+~~~~~~~~~~~~~~~~~~~
 
+.. autofunction:: aggregate
 
-:hidden:`sum_each`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: sum_each
+:hidden:`scaledL2`
+~~~~~~~~~~~~~~~~~~~
 
+.. autofunction:: scaledL2
 
-:hidden:`cat_each`
-~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: cat_each
+:hidden:`sum_square`
+~~~~~~~~~~~~~~~~~~~~
 
+.. autofunction:: sum_square
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -9,8 +9,8 @@ Created by `Hang Zhang <http://hangzh.com/>`_
 
 An optimized PyTorch package with CUDA backend. 
 
-.. todo::
-    A PyTorch DataParallel compatible Synchronized Cross-GPU Batch Normalization will be provided soon.
+.. note::
+    PyTorch compatible Synchronized Cross-GPU :class:`encoding.nn.BatchNorm2d` has been released.
 
 .. toctree::
    :glob:
@@ -34,7 +34,6 @@ An optimized PyTorch package with CUDA backend.
    syncbn
    parallel
    dilated
-   nn
    functions
    utils
 

diff --git a/docs/source/nn.rst b/docs/source/nn.rst
diff --git a/docs/source/notes/compile.rst b/docs/source/notes/compile.rst
@@ -15,11 +15,13 @@ Install from Source
 
         - On Linux::
 
+            pip install -r requirements.txt
             python setup.py install
 
         - On Mac OSX::
 
-             MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
+            pip install -r requirements.txt
+            MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
 
 Citations
 ---------

diff --git a/docs/source/notes/extending.rst b/docs/source/notes/extending.rst
@@ -8,7 +8,7 @@ which is extending :mod:`torch.nn` and
 Torch C and CUDA Backend
 ------------------------
 
-Given an example of the residual operation (in a mini-batch): 
+Given a simple example of the residual operation (in a mini-batch): 
 
 .. math::
     r_{ik} = x_i - c_k

diff --git a/docs/source/notes/syncbn.rst b/docs/source/notes/syncbn.rst
@@ -1,7 +1,7 @@
 Implementing Synchronized Multi-GPU Batch Normalization
 =======================================================
 
-In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`). We will provide the training example in a later version.
+In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d`. We will provide the training example in a later version.
 
 How BN works?
 -------------
@@ -17,13 +17,13 @@ BN layer was introduced in the paper `Batch Normalization: Accelerating Deep Net
     where :math:`\mu=\frac{\sum_i^N x_i}{N} , \sigma = \sqrt{\frac{\sum_i^N (x_i-\mu)^2}{N}+\epsilon}` and :math:`\gamma, \beta` are the learnable parameters.
 
 - Backward Pass:
-    For calculating the gradient :math:`\frac{d_\ell}{d_{x_i}}`, we need to consider the gradient from :math:`\frac{d_\ell}{d_y}` and the gradients from :math:`\frac{d_\ell}{d_\mu}` and :math:`\frac{d_\ell}{d_\sigma}`, since the :math:`\mu \text{ and } \sigma` are the function of the input :math:`x_i`. We use patial direvative in the notations:
+    For calculating the gradient :math:`\frac{d_\ell}{d_{x_i}}`, we need to consider the partial gradient from :math:`\frac{d_\ell}{d_y}` and the gradients from :math:`\frac{d_\ell}{d_\mu}` and :math:`\frac{d_\ell}{d_\sigma}`, since the :math:`\mu \text{ and } \sigma` are the function of the input :math:`x_i`. We use patial direvative in the notations:
 
     .. math::
 
-        \frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{d_{y_i}}{d_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
+        \frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{\partial_{y_i}}{\partial_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
 
-    where :math:`\frac{d_{y_i}}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
+    where :math:`\frac{\partial_{y_i}}{\partial_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
     \text{ and } \frac{d_\sigma}{d_{x_i}}=-\frac{1}{\sigma}(\frac{x_i-\mu}{N})`.
 
 Why Synchronize BN?
@@ -41,41 +41,27 @@ How to Synchronize?
 Suppose we have :math:`K` number of GPUs, :math:`sum(x)_k` and :math:`sum(x^2)_k` denotes the sum of elements and sum of element squares in :math:`k^{th}` GPU.
 
 - Forward Pass:
-    We can calculate the sum of elements :math:`sum(x)=\sum x_i \text{ and sum of squares } sum(x^2)=\sum x_i^2` in each GPU, then apply :class:`encoding.parallel.AllReduce` operation to sum accross GPUs. Then calculate the global mean :math:`\mu=\frac{sum(x)}{N} \text{ and global variance } \sigma=\sqrt{\frac{sum(x^2)}{N}-\mu^2+\epsilon}`. 
+    We can calculate the sum of elements :math:`sum(x)=\sum x_i \text{ and sum of squares } sum(x^2)=\sum x_i^2` in each GPU, then apply :class:`encoding.parallel.allreduce` operation to sum accross GPUs. Then calculate the global mean :math:`\mu=\frac{sum(x)}{N} \text{ and global variance } \sigma=\sqrt{\frac{sum(x^2)}{N}-\mu^2+\epsilon}`. 
 
 - Backward Pass:
-    * :math:`\frac{d_\ell}{d_{x_i}}=\frac{\gamma}{\sigma}` can be calculated locally in each GPU.
+    * :math:`\frac{d_\ell}{d_{x_i}}=\frac{d_\ell}{d_{y_i}}\frac{\gamma}{\sigma}` can be calculated locally in each GPU.
     * Calculate the gradient of :math:`sum(x)` and :math:`sum(x^2)` individually in each GPU :math:`\frac{d_\ell}{d_{sum(x)_k}}` and :math:`\frac{d_\ell}{d_{sum(x^2)_k}}`. 
 
-    * Then Sync the gradient (automatically handled by :class:`encoding.parallel.AllReduce`) and continue the backward.
-
-Classic Implementation
-~~~~~~~~~~~~~~~~~~~~~~
-
-- Synchronized DataParallel:
-    Standard DataParallel pipeline of public frameworks (MXNet, PyTorch...) in each training iters: 
-
-        * duplicate the network (weights) to all the GPUs,
-        * split the training batch to each GPU,
-        * forward and backward to calculate gradient,
-        * update network parameters (weights) then go to next iter.
-
-    Therefore, communicattion accross different GPUs are not supported. To address this problem, we introduce a :class:`encoding.parallel.SelfDataParallel` mode, which enables each layer to accept mutli-GPU inputs directly. Those self-parallel layers are provide in :class:`encoding.nn`.
-
-- Cross GPU Autograd:
-    Due to the BN layers are frequently used in the networks, the PyTorch autograd engine will be messed up by such a complicated backward graph. To address this problem, we provide an aotograd function :class:`encoding.parallel.AllReduce` to handle the cross GPU gradient calculation.
-
-Comparing Performance 
----------------------
-
-- Training Time:
-
-- Segmentation Performance:
+    * Then Sync the gradient (automatically handled by :class:`encoding.parallel.allreduce`) and continue the backward.
 
 
 Citation
 --------
 
 .. note::
+    This code is provided together with the paper, please cite our work.
+
+        * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
 
-    This code is provided together with the paper (coming soon), please cite our work.
+            @InProceedings{Zhang_2018_CVPR,
+            author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
+            title = {Context Encoding for Semantic Segmentation},
+            booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+            month = {June},
+            year = {2018}
+            }