Skip to content

Commit

Permalink
v0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
zhanghang1989 committed Mar 19, 2018
1 parent 01946d4 commit 8fbc9bb
Show file tree
Hide file tree
Showing 16 changed files with 95 additions and 63 deletions.
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
# PyTorch-Encoding

created by [Hang Zhang](http://hangzh.com/)

## [Documentation](http://hangzh.com/PyTorch-Encoding/)

- Please visit the [**Docs**](http://hangzh.com/PyTorch-Encoding/) for detail instructions of installation and usage.

- [**Link**](http://hangzh.com/PyTorch-Encoding/experiments/texture.html) to the Deep TEN texture classification experiments and pre-trained models.
## Citations

## Citation
**Context Encoding for Semantic Segmentation**
[Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/)
```
@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
```

**Deep TEN: Texture Encoding Network** [[arXiv]](https://arxiv.org/pdf/1612.02844.pdf)
[Hang Zhang](http://hangzh.com/), [Jia Xue](http://jiaxueweb.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html)
Expand Down
1 change: 1 addition & 0 deletions build.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.dylib')

else:
os.environ['CFLAGS'] = '-std=c99'
os.environ['TH_LIBRARIES'] = os.path.join(lib_path,'libATen.so.1')
ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.so')

Expand Down
18 changes: 8 additions & 10 deletions docs/source/experiments/texture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,18 @@ Deep TEN: Deep Texture Encoding Network Example
In this section, we show an example of training/testing Encoding-Net for texture recognition on MINC-2500 dataset. Comparing to original Torch implementation, we use *different learning rate* for pre-trained base network and encoding layer (10x), disable color jittering after reducing lr and adopt much *smaller training image size* (224 instead of 352).


.. note::
**Make Sure** to `Install PyTorch Encoding <../notes/compile.html>`_ First.

Test Pre-trained Model
----------------------


- Clone the GitHub repo (I am sure you did during the installation)::
- Clone the GitHub repo::

git clone git@github.com:zhanghang1989/PyTorch-Encoding.git

- Install PyTorch Encoding (if not yet). Please follow the installation guide `Installing PyTorch Encoding <../notes/compile.html>`_.

- Download the `MINC-2500 <http://opensurfaces.cs.cornell.edu/publications/minc/>`_ dataset to ``$HOME/data/minc-2500/`` folder. Download pre-trained model (training `curve`_ as bellow, pre-trained on train-1 split using single training size of 224, with an error rate of :math:`19.98\%` using single crop on test-1 set)::

cd PyTorch-Encoding/experiments
cd PyTorch-Encoding/experiments/recognition
bash model/download_models.sh

.. _curve:
Expand All @@ -41,14 +39,14 @@ Train Your Own Model

- Example training command for training above model::

python main.py --model deepten --nclass 23 --model encodingnet --batch-size 64 --lr 0.01 --epochs 60
python main.py --model deepten --nclass 23 --model deepten --batch-size 64 --lr 0.01 --epochs 60

- Training options::
- Detail training options::

-h, --help show this help message and exit
--dataset DATASET training dataset (default: cifar10)
--model MODEL network model type (default: densenet)
--widen N widen factor of the network (default: 4)
--backbone BACKBONE backbone name (default: resnet50)
--batch-size N batch size for training (default: 128)
--test-batch-size N batch size for testing (default: 1000)
--epochs N number of epochs to train (default: 300)
Expand All @@ -69,7 +67,7 @@ Train Your Own Model
Extending the Software
----------------------

This code includes an integrated pipeline and some visualization tools (progress bar, real-time training curve plots). It is easy to use and extend for your own model or dataset:
This code is well written, easy to use and extendable for your own models or datasets:

- Write your own Dataloader ``mydataset.py`` to ``dataset/`` folder

Expand Down
13 changes: 12 additions & 1 deletion docs/source/notes/compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,18 @@ Reference
---------

.. note::
If using the code in your research, please cite our paper.
If using the code in your research, please cite our papers.

* Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation" *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::

@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}


* Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*::

Expand Down
7 changes: 5 additions & 2 deletions docs/source/notes/syncbn.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Implementing Synchronized Multi-GPU Batch Normalization
=======================================================

In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`. We will provide the training example in a later version.
In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`). We will provide the training example in a later version.

How BN works?
-------------
Expand All @@ -23,7 +23,7 @@ BN layer was introduced in the paper `Batch Normalization: Accelerating Deep Net
\frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{d_{y_i}}{d_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
where :math:`\frac{d_\ell}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
where :math:`\frac{d_{y_i}}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
\text{ and } \frac{d_\sigma}{d_{x_i}}=-\frac{1}{\sigma}(\frac{x_i-\mu}{N})`.

Why Synchronize BN?
Expand All @@ -49,6 +49,9 @@ Suppose we have :math:`K` number of GPUs, :math:`sum(x)_k` and :math:`sum(x^2)_k

* Then Sync the gradient (automatically handled by :class:`encoding.parallel.AllReduce`) and continue the backward.

Classic Implementation
~~~~~~~~~~~~~~~~~~~~~~

- Synchronized DataParallel:
Standard DataParallel pipeline of public frameworks (MXNet, PyTorch...) in each training iters:

Expand Down
1 change: 0 additions & 1 deletion encoding/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ IF(MSVC)
ENDIF()

TARGET_LINK_LIBRARIES(ENCODING
${THC_LIBRARIES}
${TH_LIBRARIES}
${CUDA_cusparse_LIBRARY}
)
Expand Down
1 change: 0 additions & 1 deletion encoding/cmake/FindTorch.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,3 @@ SET(Torch_INSTALL_INCLUDE "${TORCH_BUILD_DIR}/include" ${TORCH_TH_INCLUDE_DIR} $

# Find the libs. We need to find libraries one by one.
SET(TH_LIBRARIES "$ENV{TH_LIBRARIES}")
SET(THC_LIBRARIES "$ENV{THC_LIBRARIES}")
4 changes: 3 additions & 1 deletion encoding/dilated/densenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,9 @@ def __init__(self, num_input_features, num_output_features, stride, dilation=1):


class DenseNet(nn.Module):
r"""Dilated Densenet-BC model class
r"""Dilated DenseNet.
For correctly dilation of transition layer fo DenseNet, we implement the :class:`encoding.nn.DilatedAvgPool2d`.
Args:
growth_rate (int) - how many filters to add each layer (`k` in paper)
Expand Down
10 changes: 5 additions & 5 deletions encoding/functions/syncbn.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ def sum_square(input):


class _batchnorm(Function):
def __init__(self, training=False):
super(_batchnorm, self).__init__()
self.training = training
def __init__(ctx, training=False):
super(_batchnorm, ctx).__init__()
ctx.training = training

def forward(ctx, input, gamma, beta, mean, std):
ctx.save_for_backward(input, gamma, beta, mean, std)
Expand Down Expand Up @@ -99,13 +99,13 @@ def backward(ctx, gradOutput):
encoding_lib.Encoding_Float_batchnorm_Backward(
gradOutput, input, gradInput, gradGamma, gradBeta,
mean, invstd, gamma, beta, gradMean, gradStd,
self.training)
ctx.training)
elif isinstance(input, torch.cuda.DoubleTensor):
with torch.cuda.device_of(input):
encoding_lib.Encoding_Double_batchnorm_Backward(
gradOutput, input, gradInput, gradGamma, gradBeta,
mean, invstd, gamma, beta, gradMean, gradStd,
self.training)
ctx.training)
else:
raise RuntimeError('Unimplemented data type!')
return gradInput, gradGamma, gradBeta, gradMean, gradStd
Expand Down
2 changes: 1 addition & 1 deletion encoding/kernel/generic/pooling_kernel.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ __global__ void Encoding_(DilatedAvgPool_Forward_kernel) (
c = bc - b*C;
/* boundary check for output */
if (w >= Y.getSize(3) || h >= Y.getSize(2)) return;
int hstart = h*dW -padH;
int hstart = h*dH -padH;
int wstart = w*dW -padW;
int hend = min(hstart + kH*dilationH, X.getSize(2));
int wend = min(wstart + kW*dilationW, X.getSize(3));
Expand Down
34 changes: 17 additions & 17 deletions encoding/nn/encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@
from torch.nn import Module, Parameter
import torch.nn.functional as F
from torch.autograd import Function, Variable
from torch.nn.modules.utils import _single, _pair, _triple

from .._ext import encoding_lib
from ..functions import scaledL2, aggregate
from ..parallel import my_data_parallel
from ..functions import dilatedavgpool2d

__all__ = ['Encoding', 'EncodingShake', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d']
__all__ = ['Encoding', 'EncodingDrop', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d']

class Encoding(Module):
r"""
Expand Down Expand Up @@ -104,9 +105,9 @@ def __repr__(self):
+ 'N x ' + str(self.D) + '=>' + str(self.K) + 'x' \
+ str(self.D) + ')'

class EncodingShake(Module):
class EncodingDrop(Module):
def __init__(self, D, K):
super(EncodingShake, self).__init__()
super(EncodingDrop, self).__init__()
# init codewords and smoothing factor
self.D, self.K = D, K
self.codewords = Parameter(torch.Tensor(K, D),
Expand All @@ -119,7 +120,7 @@ def reset_params(self):
self.codewords.data.uniform_(-std1, std1)
self.scale.data.uniform_(-1, 0)

def shake(self):
def _drop(self):
if self.training:
self.scale.data.uniform_(-1, 0)
else:
Expand All @@ -143,14 +144,12 @@ def forward(self, X):
X = X.view(B,D,-1).transpose(1,2).contiguous()
else:
raise RuntimeError('Encoding Layer unknown input dims!')
# shake
self.shake()
self._drop()
# assignment weights
A = F.softmax(scaledL2(X, self.codewords, self.scale), dim=1)
# aggregate
E = aggregate(A, X, self.codewords)
# shake
self.shake()
self._drop()
return E

def __repr__(self):
Expand Down Expand Up @@ -202,27 +201,27 @@ class DilatedAvgPool2d(Module):
r"""We provide Dilated Average Pooling for the dilation of Densenet as
in :class:`encoding.dilated.DenseNet`.
Reference::
Reference:
We provide this code for a comming paper.
Applies a 2D average pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
output :math:`(B, C, H_{out}, W_{out})`, :attr:`kernel_size` :math:`(k_H,k_W)`, :attr:`stride` :math:`(s_H,s_W)` :attr:`dilation` :math:`(d_H,d_W)`
can be precisely described as:
.. math::
\begin{array}{ll}
out(b, c, h, w) = 1 / (kH * kW) *
\sum_{{m}=0}^{kH-1} \sum_{{n}=0}^{kW-1}
input(b, c, dH * h + m, dW * w + n)
out(b, c, h, w) = 1 / (k_H \cdot k_W) \cdot
\sum_{{m}=0}^{k_H-1} \sum_{{n}=0}^{k_W-1}
input(b, c, s_H \cdot h + d_H \cdot m, s_W \cdot w + d_W \cdot n)
\end{array}
| If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides
for :attr:`padding` number of points
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
| The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
Expand All @@ -235,10 +234,11 @@ class DilatedAvgPool2d(Module):
dilation: the dilation parameter similar to Conv2d
Shape:
- Input: :math:`(N, C, H_{in}, W_{in})`
- Output: :math:`(N, C, H_{out}, W_{out})` where
- Input: :math:`(B, C, H_{in}, W_{in})`
- Output: :math:`(B, C, H_{out}, W_{out})` where
:math:`H_{out} = floor((H_{in} + 2 * padding[0] - kernel\_size[0]) / stride[0] + 1)`
:math:`W_{out} = floor((W_{in} + 2 * padding[1] - kernel\_size[1]) / stride[1] + 1)`
For :attr:`stride=1`, the output featuremap preserves the same size as input.
Examples::
Expand Down Expand Up @@ -306,7 +306,7 @@ class UpsampleConv2d(Module):
(in_channels, scale * scale * out_channels, kernel_size[0], kernel_size[1])
bias (Tensor): the learnable bias of the module of shape (scale * scale * out_channels)
Examples::
Examples:
>>> # With square kernels and equal stride
>>> m = nn.UpsampleCov2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
Expand Down
18 changes: 17 additions & 1 deletion encoding/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from torch.nn.parallel.replicate import replicate
from torch.nn.parallel.parallel_apply import parallel_apply

__all__ = ['AllReduce', 'Broadcast', 'ModelDataParallel',
__all__ = ['Reduce', 'AllReduce', 'Broadcast', 'ModelDataParallel',
'CriterionDataParallel', 'SelfDataParallel']

def nccl_all_reduce(inputs):
Expand All @@ -45,6 +45,22 @@ def comm_all_reduce(inputs):
results.append(result.clone().cuda(i))
return results

class Reduce(Function):
def forward(ctx, *inputs):
ctx.save_for_backward(*inputs)
if len(inputs) == 1:
return inputs[0]
return comm.reduce_add(inputs)

def backward(ctx, gradOutput):
inputs = tuple(ctx.saved_tensors)
if len(inputs) == 1:
return gradOutput
gradInputs = []
for i in range(len(inputs)):
with torch.cuda.device_of(inputs[i]):
gradInputs.append(gradOutput.cuda())
return tuple(gradInputs)

class AllReduce(Function):
"""Cross GPU all reduce autograd operation for calculate mean and
Expand Down
4 changes: 3 additions & 1 deletion experiments/recognition/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
- [Link to the Deep TEN pre-trained models and experiments](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
- [Link to the EncNet CIFAR experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/cifar.html)

- [Link to the Deep TEN experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
11 changes: 8 additions & 3 deletions experiments/recognition/option.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,17 @@ def __init__(self):
help='number of classes (default: 10)')
parser.add_argument('--widen', type=int, default=4, metavar='N',
help='widen factor of the network (default: 4)')
parser.add_argument('--ncodes', type=int, default=32, metavar='N',
help='number of codewords in Encoding Layer (default: 32)')
parser.add_argument('--backbone', type=str, default='resnet50',
help='backbone name (default: resnet50)')
# training hyper params
parser.add_argument('--batch-size', type=int, default=128,
metavar='N', help='batch size for training (default: 128)')
parser.add_argument('--test-batch-size', type=int, default=256,
metavar='N', help='batch size for testing (default: 256)')
parser.add_argument('--epochs', type=int, default=300, metavar='N',
help='number of epochs to train (default: 300)')
parser.add_argument('--epochs', type=int, default=600, metavar='N',
help='number of epochs to train (default: 600)')
parser.add_argument('--start_epoch', type=int, default=1,
metavar='N', help='the epoch number to start (default: 0)')
# lr setting
Expand Down Expand Up @@ -65,4 +67,7 @@ def __init__(self):
self.parser = parser

def parse(self):
return self.parser.parse_args()
args = self.parser.parse_args()
if args.dataset == 'minc':
args.nclass = 23
return args
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def create_version_file():
with open(version_path, 'w') as f:
f.write("__version__ = '{}'\n".format(version))

version = '0.1.0'
version = '0.2.0'
try:
sha = subprocess.check_output(['git', 'rev-parse', 'HEAD'],
cwd=cwd).decode('ascii').strip()
Expand Down
Loading

0 comments on commit 8fbc9bb

Please sign in to comment.