v0.2.0

suyanzhou626 · Mar 19, 2018 · 8fbc9bb · 8fbc9bb
1 parent 01946d4
commit 8fbc9bb
Show file tree

Hide file tree

Showing 16 changed files with 95 additions and 63 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,24 @@
 # PyTorch-Encoding
 
+created by [Hang Zhang](http://hangzh.com/)
+
 ## [Documentation](http://hangzh.com/PyTorch-Encoding/)
 
 - Please visit the [**Docs**](http://hangzh.com/PyTorch-Encoding/) for detail instructions of installation and usage. 
 
-- [**Link**](http://hangzh.com/PyTorch-Encoding/experiments/texture.html) to the Deep TEN texture classification experiments and pre-trained models.
+## Citations
 
-## Citation
+**Context Encoding for Semantic Segmentation**  
+  [Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/)
+```
+@InProceedings{Zhang_2018_CVPR,
+author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
+title = {Context Encoding for Semantic Segmentation},
+booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+month = {June},
+year = {2018}
+}
+```
 
 **Deep TEN: Texture Encoding Network** [[arXiv]](https://arxiv.org/pdf/1612.02844.pdf)  
   [Hang Zhang](http://hangzh.com/), [Jia Xue](http://jiaxueweb.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html)

diff --git a/build.py b/build.py
@@ -29,6 +29,7 @@
     ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.dylib')
 
 else:
+    os.environ['CFLAGS'] = '-std=c99'
     os.environ['TH_LIBRARIES'] = os.path.join(lib_path,'libATen.so.1')
     ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.so')
 

diff --git a/docs/source/experiments/texture.rst b/docs/source/experiments/texture.rst
@@ -8,20 +8,18 @@ Deep TEN: Deep Texture Encoding Network Example
 In this section, we show an example of training/testing Encoding-Net for texture recognition on MINC-2500 dataset. Comparing to original Torch implementation, we use *different learning rate* for pre-trained base network and encoding layer (10x), disable color jittering after reducing lr and adopt much *smaller training image size* (224 instead of 352). 
 
 
-.. note::
-    **Make Sure** to `Install PyTorch Encoding <../notes/compile.html>`_ First.
-
 Test Pre-trained Model
 ----------------------
 
-
-- Clone the GitHub repo (I am sure you did during the installation)::
+- Clone the GitHub repo::
 
     git clone git@github.com:zhanghang1989/PyTorch-Encoding.git
 
+- Install PyTorch Encoding (if not yet). Please follow the installation guide `Installing PyTorch Encoding <../notes/compile.html>`_.
+
 - Download the `MINC-2500 <http://opensurfaces.cs.cornell.edu/publications/minc/>`_ dataset to ``$HOME/data/minc-2500/`` folder. Download pre-trained model (training `curve`_ as bellow, pre-trained on train-1 split using single training size of 224, with an error rate of :math:`19.98\%` using single crop on test-1 set)::
 
-    cd PyTorch-Encoding/experiments
+    cd PyTorch-Encoding/experiments/recognition
     bash model/download_models.sh
 
 .. _curve:
@@ -41,14 +39,14 @@ Train Your Own Model
 
 - Example training command for training above model::
 
-    python main.py --model deepten --nclass 23 --model encodingnet --batch-size 64 --lr 0.01 --epochs 60 
+    python main.py --model deepten --nclass 23 --model deepten --batch-size 64 --lr 0.01 --epochs 60 
 
-- Training options::
+- Detail training options::
 
   -h, --help            show this help message and exit
   --dataset DATASET     training dataset (default: cifar10)
   --model MODEL         network model type (default: densenet)
-  --widen N             widen factor of the network (default: 4)
+  --backbone BACKBONE   backbone name (default: resnet50)
   --batch-size N        batch size for training (default: 128)
   --test-batch-size N   batch size for testing (default: 1000)
   --epochs N            number of epochs to train (default: 300)
@@ -69,7 +67,7 @@ Train Your Own Model
 Extending the Software
 ----------------------
 
-This code includes an integrated pipeline and some visualization tools (progress bar, real-time training curve plots). It is easy to use and extend for your own model or dataset:
+This code is well written, easy to use and extendable for your own models or datasets:
 
 - Write your own Dataloader ``mydataset.py`` to ``dataset/`` folder
 

diff --git a/docs/source/notes/compile.rst b/docs/source/notes/compile.rst
@@ -25,7 +25,18 @@ Reference
 ---------
 
     .. note::
-        If using the code in your research, please cite our paper.
+        If using the code in your research, please cite our papers.
+
+        * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
+
+            @InProceedings{Zhang_2018_CVPR,
+            author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
+            title = {Context Encoding for Semantic Segmentation},
+            booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+            month = {June},
+            year = {2018}
+            }
+
 
         * Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*::
 

diff --git a/docs/source/notes/syncbn.rst b/docs/source/notes/syncbn.rst
@@ -1,7 +1,7 @@
 Implementing Synchronized Multi-GPU Batch Normalization
 =======================================================
 
-In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`. We will provide the training example in a later version.
+In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`). We will provide the training example in a later version.
 
 How BN works?
 -------------
@@ -23,7 +23,7 @@ BN layer was introduced in the paper `Batch Normalization: Accelerating Deep Net
 
         \frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{d_{y_i}}{d_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
 
-    where :math:`\frac{d_\ell}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}} 
+    where :math:`\frac{d_{y_i}}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
     \text{ and } \frac{d_\sigma}{d_{x_i}}=-\frac{1}{\sigma}(\frac{x_i-\mu}{N})`.
 
 Why Synchronize BN?
@@ -49,6 +49,9 @@ Suppose we have :math:`K` number of GPUs, :math:`sum(x)_k` and :math:`sum(x^2)_k
 
     * Then Sync the gradient (automatically handled by :class:`encoding.parallel.AllReduce`) and continue the backward.
 
+Classic Implementation
+~~~~~~~~~~~~~~~~~~~~~~
+
 - Synchronized DataParallel:
     Standard DataParallel pipeline of public frameworks (MXNet, PyTorch...) in each training iters: 
 

diff --git a/encoding/CMakeLists.txt b/encoding/CMakeLists.txt
@@ -62,7 +62,6 @@ IF(MSVC)
 ENDIF()
 
 TARGET_LINK_LIBRARIES(ENCODING 
-	${THC_LIBRARIES} 
 	${TH_LIBRARIES} 
 	${CUDA_cusparse_LIBRARY}
 )

diff --git a/encoding/cmake/FindTorch.cmake b/encoding/cmake/FindTorch.cmake
@@ -36,4 +36,3 @@ SET(Torch_INSTALL_INCLUDE "${TORCH_BUILD_DIR}/include" ${TORCH_TH_INCLUDE_DIR} $
 
 # Find the libs. We need to find libraries one by one.
 SET(TH_LIBRARIES "$ENV{TH_LIBRARIES}")
-SET(THC_LIBRARIES "$ENV{THC_LIBRARIES}")
diff --git a/encoding/dilated/densenet.py b/encoding/dilated/densenet.py
@@ -120,7 +120,9 @@ def __init__(self, num_input_features, num_output_features, stride, dilation=1):
 
 
 class DenseNet(nn.Module):
-    r"""Dilated Densenet-BC model class
+    r"""Dilated DenseNet.
+
+    For correctly dilation of transition layer fo DenseNet, we implement the :class:`encoding.nn.DilatedAvgPool2d`.
 
     Args:
         growth_rate (int) - how many filters to add each layer (`k` in paper)

diff --git a/encoding/functions/syncbn.py b/encoding/functions/syncbn.py
@@ -62,9 +62,9 @@ def sum_square(input):
 
 
 class _batchnorm(Function):
-    def __init__(self, training=False):
-        super(_batchnorm, self).__init__()
-        self.training = training
+    def __init__(ctx, training=False):
+        super(_batchnorm, ctx).__init__()
+        ctx.training = training
 
     def forward(ctx, input, gamma, beta, mean, std):
         ctx.save_for_backward(input, gamma, beta, mean, std)
@@ -99,13 +99,13 @@ def backward(ctx, gradOutput):
                 encoding_lib.Encoding_Float_batchnorm_Backward(
                     gradOutput, input, gradInput, gradGamma, gradBeta, 
                     mean, invstd, gamma, beta, gradMean, gradStd,
-                    self.training) 
+                    ctx.training) 
         elif isinstance(input, torch.cuda.DoubleTensor):
             with torch.cuda.device_of(input):
                 encoding_lib.Encoding_Double_batchnorm_Backward(
                     gradOutput, input, gradInput, gradGamma, gradBeta, 
                     mean, invstd, gamma, beta, gradMean, gradStd,
-                    self.training) 
+                    ctx.training) 
         else:
             raise RuntimeError('Unimplemented data type!')
         return gradInput, gradGamma, gradBeta, gradMean, gradStd

diff --git a/encoding/kernel/generic/pooling_kernel.c b/encoding/kernel/generic/pooling_kernel.c
@@ -35,7 +35,7 @@ __global__ void Encoding_(DilatedAvgPool_Forward_kernel) (
     c = bc - b*C;
     /* boundary check for output */
     if (w >= Y.getSize(3) || h >= Y.getSize(2)) return;
-    int hstart = h*dW -padH;
+    int hstart = h*dH -padH;
     int wstart = w*dW -padW;
     int hend = min(hstart + kH*dilationH, X.getSize(2));
     int wend = min(wstart + kW*dilationW, X.getSize(3));

diff --git a/encoding/nn/encoding.py b/encoding/nn/encoding.py
@@ -13,13 +13,14 @@
 from torch.nn import Module, Parameter
 import torch.nn.functional as F
 from torch.autograd import Function, Variable
+from torch.nn.modules.utils import _single, _pair, _triple
 
 from .._ext import encoding_lib
 from ..functions import scaledL2, aggregate
 from ..parallel import my_data_parallel
 from ..functions import dilatedavgpool2d
 
-__all__ = ['Encoding', 'EncodingShake', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d'] 
+__all__ = ['Encoding', 'EncodingDrop', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d'] 
 
 class Encoding(Module):
     r"""
@@ -104,9 +105,9 @@ def __repr__(self):
             + 'N x ' + str(self.D) + '=>' + str(self.K) + 'x' \
             + str(self.D) + ')'
 
-class EncodingShake(Module):
+class EncodingDrop(Module):
     def __init__(self, D, K):
-        super(EncodingShake, self).__init__()
+        super(EncodingDrop, self).__init__()
         # init codewords and smoothing factor
         self.D, self.K = D, K
         self.codewords = Parameter(torch.Tensor(K, D), 
@@ -119,7 +120,7 @@ def reset_params(self):
         self.codewords.data.uniform_(-std1, std1)
         self.scale.data.uniform_(-1, 0)
 
-    def shake(self):
+    def _drop(self):
         if self.training:
             self.scale.data.uniform_(-1, 0)
         else:
@@ -143,14 +144,12 @@ def forward(self, X):
             X = X.view(B,D,-1).transpose(1,2).contiguous()
         else:
             raise RuntimeError('Encoding Layer unknown input dims!')
-        # shake
-        self.shake()
+        self._drop()
         # assignment weights
         A = F.softmax(scaledL2(X, self.codewords, self.scale), dim=1)
         # aggregate
         E = aggregate(A, X, self.codewords)
-        # shake
-        self.shake()
+        self._drop()
         return E
 
     def __repr__(self):
@@ -202,27 +201,27 @@ class DilatedAvgPool2d(Module):
     r"""We provide Dilated Average Pooling for the dilation of Densenet as
     in :class:`encoding.dilated.DenseNet`.
 
-    Reference::
+    Reference:
         We provide this code for a comming paper.
 
     Applies a 2D average pooling over an input signal composed of several input planes.
 
     In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
-    output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
+    output :math:`(B, C, H_{out}, W_{out})`, :attr:`kernel_size` :math:`(k_H,k_W)`, :attr:`stride` :math:`(s_H,s_W)` :attr:`dilation` :math:`(d_H,d_W)`
     can be precisely described as:
 
     .. math::
 
         \begin{array}{ll}
-        out(b, c, h, w)  = 1 / (kH * kW) * 
-        \sum_{{m}=0}^{kH-1} \sum_{{n}=0}^{kW-1}
-        input(b, c, dH * h + m, dW * w + n)
+        out(b, c, h, w)  = 1 / (k_H \cdot k_W) \cdot 
+        \sum_{{m}=0}^{k_H-1} \sum_{{n}=0}^{k_W-1}
+        input(b, c, s_H \cdot h + d_H \cdot m, s_W \cdot w + d_W \cdot n)
         \end{array}
 
     | If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides
       for :attr:`padding` number of points
 
-    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
+    | The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
 
         - a single ``int`` -- in which case the same value is used for the height and width dimension
         - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
@@ -235,10 +234,11 @@ class DilatedAvgPool2d(Module):
         dilation: the dilation parameter similar to Conv2d
 
     Shape:
-        - Input: :math:`(N, C, H_{in}, W_{in})`
-        - Output: :math:`(N, C, H_{out}, W_{out})` where
+        - Input: :math:`(B, C, H_{in}, W_{in})`
+        - Output: :math:`(B, C, H_{out}, W_{out})` where
           :math:`H_{out} = floor((H_{in}  + 2 * padding[0] - kernel\_size[0]) / stride[0] + 1)`
           :math:`W_{out} = floor((W_{in}  + 2 * padding[1] - kernel\_size[1]) / stride[1] + 1)`
+          For :attr:`stride=1`, the output featuremap preserves the same size as input.
 
     Examples::
 
@@ -306,7 +306,7 @@ class UpsampleConv2d(Module):
                          (in_channels, scale * scale * out_channels, kernel_size[0], kernel_size[1])
         bias (Tensor):   the learnable bias of the module of shape (scale * scale * out_channels)
 
-    Examples::
+    Examples:
         >>> # With square kernels and equal stride
         >>> m = nn.UpsampleCov2d(16, 33, 3, stride=2)
         >>> # non-square kernels and unequal stride and with padding

diff --git a/encoding/parallel.py b/encoding/parallel.py
@@ -19,7 +19,7 @@
 from torch.nn.parallel.replicate import replicate
 from torch.nn.parallel.parallel_apply import parallel_apply
 
-__all__ = ['AllReduce', 'Broadcast', 'ModelDataParallel', 
+__all__ = ['Reduce', 'AllReduce', 'Broadcast', 'ModelDataParallel', 
     'CriterionDataParallel', 'SelfDataParallel']
 
 def nccl_all_reduce(inputs):
@@ -45,6 +45,22 @@ def comm_all_reduce(inputs):
         results.append(result.clone().cuda(i))
     return results
 
+class Reduce(Function):
+    def forward(ctx, *inputs):
+        ctx.save_for_backward(*inputs)
+        if len(inputs) == 1:
+            return inputs[0]
+        return comm.reduce_add(inputs)
+
+    def backward(ctx, gradOutput):
+        inputs = tuple(ctx.saved_tensors)
+        if len(inputs) == 1:
+            return gradOutput
+        gradInputs = []
+        for i in range(len(inputs)):
+            with torch.cuda.device_of(inputs[i]):
+                gradInputs.append(gradOutput.cuda())
+        return tuple(gradInputs)
 
 class AllReduce(Function):
     """Cross GPU all reduce autograd operation for calculate mean and

diff --git a/experiments/recognition/README.md b/experiments/recognition/README.md
@@ -1 +1,3 @@
-- [Link to the Deep TEN pre-trained models and experiments](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
+- [Link to the EncNet CIFAR experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/cifar.html)
+
+- [Link to the Deep TEN experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
diff --git a/experiments/recognition/option.py b/experiments/recognition/option.py
@@ -24,15 +24,17 @@ def __init__(self):
             help='number of classes (default: 10)')
         parser.add_argument('--widen', type=int, default=4, metavar='N',
             help='widen factor of the network (default: 4)')
+        parser.add_argument('--ncodes', type=int, default=32, metavar='N',
+            help='number of codewords in Encoding Layer (default: 32)')
         parser.add_argument('--backbone', type=str, default='resnet50',
             help='backbone name (default: resnet50)')
         # training hyper params
         parser.add_argument('--batch-size', type=int, default=128,
             metavar='N', help='batch size for training (default: 128)')
         parser.add_argument('--test-batch-size', type=int, default=256, 
             metavar='N', help='batch size for testing (default: 256)')
-        parser.add_argument('--epochs', type=int, default=300, metavar='N',
-            help='number of epochs to train (default: 300)')
+        parser.add_argument('--epochs', type=int, default=600, metavar='N',
+            help='number of epochs to train (default: 600)')
         parser.add_argument('--start_epoch', type=int, default=1, 
             metavar='N', help='the epoch number to start (default: 0)')
         # lr setting
@@ -65,4 +67,7 @@ def __init__(self):
         self.parser = parser
 
     def parse(self):
-        return self.parser.parse_args()
+        args = self.parser.parse_args()
+        if args.dataset == 'minc':
+            args.nclass = 23
+        return args
diff --git a/setup.py b/setup.py
@@ -32,7 +32,7 @@ def create_version_file():
         with open(version_path, 'w') as f:
             f.write("__version__ = '{}'\n".format(version))
 
-version = '0.1.0'
+version = '0.2.0'
 try:
     sha = subprocess.check_output(['git', 'rev-parse', 'HEAD'], 
         cwd=cwd).decode('ascii').strip()