adds architecture for grid k520, tesla k80 gpus

Walter1218 · May 23, 2017 · c9ace58 · c9ace58
1 parent 271d6c5
commit c9ace58
Showing 1 changed file with 33 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -1,36 +1,36 @@
 # tf-faster-rcnn
 A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available [here](https://github.com/rbgirshick/py-faster-rcnn).
 
-**Note**: Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report [An Implementation of Faster RCNN with Study for Region Sampling](https://arxiv.org/pdf/1702.02138.pdf). If you are seeking to reproduce the results in the original paper, please use the [official code](https://github.com/ShaoqingRen/faster_rcnn) or maybe the [semi-official code](https://github.com/rbgirshick/py-faster-rcnn). For details about the faster RCNN architecture please refer to the paper [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/pdf/1506.01497.pdf). 
+**Note**: Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report [An Implementation of Faster RCNN with Study for Region Sampling](https://arxiv.org/pdf/1702.02138.pdf). If you are seeking to reproduce the results in the original paper, please use the [official code](https://github.com/ShaoqingRen/faster_rcnn) or maybe the [semi-official code](https://github.com/rbgirshick/py-faster-rcnn). For details about the faster RCNN architecture please refer to the paper [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/pdf/1506.01497.pdf).
 
 ### Detection Performance
 The current code supports **VGG16** and **Resnet V1** models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi-scale, no multi-stage bounding box regression, no skip-connection, no extra input is used. The only data augmentation technique is left-right flipping during training following the original Faster RCNN. All models are released.
 
 With VGG16 (``conv5_3``):
   - Train on VOC 2007 trainval and test on VOC 2007 test, **71.2**.
   - Train on VOC 2007+2012 trainval and test on VOC 2007 test ([R-FCN](https://github.com/daijifeng001/R-FCN) schedule), **75.3**.
-  - Train on COCO 2014 [trainval35k](https://github.com/rbgirshick/py-faster-rcnn/tree/master/models) and test on [minival](https://github.com/rbgirshick/py-faster-rcnn/tree/master/models) (900k/1190k), **29.5**. 
-  
+  - Train on COCO 2014 [trainval35k](https://github.com/rbgirshick/py-faster-rcnn/tree/master/models) and test on [minival](https://github.com/rbgirshick/py-faster-rcnn/tree/master/models) (900k/1190k), **29.5**.
+
 With Resnet101 (last ``conv4``):
-  - Train on VOC 2007 trainval and test on VOC 2007 test, **75.2**. 
+  - Train on VOC 2007 trainval and test on VOC 2007 test, **75.2**.
   - Train on VOC 2007+2012 trainval and test on VOC 2007 test (R-FCN schedule), **79.3**.
-  - Train on COCO 2014 trainval35k and test on minival (900k/1190k), **34.1**. 
-  
+  - Train on COCO 2014 trainval35k and test on minival (900k/1190k), **34.1**.
+
 More Resnets:
-  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), **31.6**. 
-  - Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), **35.2**. 
-  
+  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), **31.6**.
+  - Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), **35.2**.
+
 Approximate [FPN](https://arxiv.org/abs/1612.03144) *baseline* [setup](https://github.com/endernewton/tf-faster-rcnn/blob/master/experiments/cfgs/res101-lg.yml):
-  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), **33.4**. 
-  - Train Resnet101 on COCO 2014 trainval35k and test on minival (1000k/1390k), **36.0**. 
-  - Train Resnet152 on COCO 2014 trainval35k and test on minival (1000k/1390k), **37.2**. 
-  
+  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), **33.4**.
+  - Train Resnet101 on COCO 2014 trainval35k and test on minival (1000k/1390k), **36.0**.
+  - Train Resnet152 on COCO 2014 trainval35k and test on minival (1000k/1390k), **37.2**.
+
 **Note**:
   - Due to the randomness in GPU training with Tensorflow espeicially for VOC, the best numbers are reported (with 2-3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within ~0.2%) despite the randomness.
-  - **All** the numbers are obtained with a different testing scheme without selecting region proposals using non-maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will likely result in slightly worse performance (see [report](https://arxiv.org/pdf/1702.02138.pdf), for COCO it drops 0.X AP). 
+  - **All** the numbers are obtained with a different testing scheme without selecting region proposals using non-maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will likely result in slightly worse performance (see [report](https://arxiv.org/pdf/1702.02138.pdf), for COCO it drops 0.X AP).
   - Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects.
   - For other minor modifications, please check the [report](https://arxiv.org/pdf/1702.02138.pdf). Notable ones include using ``crop_and_resize``, and excluding ground truth boxes in RoIs during training.
-  - For COCO, we find the performance improving with more iterations (VGG16 350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.5), and potentially better performance can be achieved with even more iterations. 
+  - For COCO, we find the performance improving with more iterations (VGG16 350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.5), and potentially better performance can be achieved with even more iterations.
   - For Resnets, we fix the first block (total 4) when fine-tuning the network, and only use ``crop_and_resize`` to resize the RoIs (7x7) without max-pool (which I find useless especially for COCO). The final feature maps are average-pooled for classification and regression. All batch normalization parameters are fixed. Weight decay is set to Renset101 default 1e-4. Learning rate for biases is not doubled.
   - For approximate [FPN](https://arxiv.org/abs/1612.03144) baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing.
   - Check out [here](http://ladoga.graphics.cs.cmu.edu/xinleic/tf-faster-rcnn/)/[here](http://gs11655.sp.cs.cmu.edu/xinleic/tf-faster-rcnn/)/[here](https://drive.google.com/open?id=0B1_fAEgxdnvJSmF3YUlZcHFqWTQ) for the latest models, including longer COCO VGG16 models and Resnet ones.
@@ -55,10 +55,18 @@ Additional features not mentioned in the [report](https://arxiv.org/pdf/1702.021
 2. Update your -arch in setup script to match your GPU
   ```Shell
   cd tf-faster-rcnn/lib
-  # Check the GPU architecture
+  # Change the GPU architecture (-arch) if necessary
   vim setup.py
   ```
-  **Note**: I only have access to TitanX (both Maxwell and Pascal) and sm_52 seems to work fine on those machines. You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs.
+
+  | GPU model  | Architecture |
+  | ------------- | ------------- |
+  | TitanX (Maxwell/Pascal)  | sm_52  |
+  | Grid K520 (AWS g2.2xlarge)  | sm_30  |
+  | Tesla K80 (AWS p2.xlarge)   | sm_37  |
+
+  **Note**: You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs.
+
 
 3. Build the Cython modules
   ```Shell
@@ -77,7 +85,7 @@ Additional features not mentioned in the [report](https://arxiv.org/pdf/1702.021
 ### Setup data
 Please follow the instructions of py-faster-rcnn [here](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating softlinks in the ``data`` folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.
 
-If you find it useful, the ``data/cache`` folder created on my side is also shared [here](http://ladoga.graphics.cs.cmu.edu/xinleic/tf-faster-rcnn/cache.tgz). 
+If you find it useful, the ``data/cache`` folder created on my side is also shared [here](http://ladoga.graphics.cs.cmu.edu/xinleic/tf-faster-rcnn/cache.tgz).
 
 ### Demo and Test with pre-trained models
 1. Download pre-trained model
@@ -103,16 +111,16 @@ If you find it useful, the ``data/cache`` folder created on my side is also shar
   ```Shell
   # at reposistory root
   GPU_ID=0
-  CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py 
+  CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py
   ```
-  **Note**: Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to [Issue 25](https://github.com/endernewton/tf-faster-rcnn/issues/25). 
+  **Note**: Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to [Issue 25](https://github.com/endernewton/tf-faster-rcnn/issues/25).
 
 4. Test with pre-trained Resnet101 models
   ```Shell
   GPU_ID=0
   ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101
   ```
-  **Note**: If you cannot get the reported numbers (78.7 on my side), then probabaly the NMS function is compiled improperly, refer to [Issue 5](https://github.com/endernewton/tf-faster-rcnn/issues/5). 
+  **Note**: If you cannot get the reported numbers (78.7 on my side), then probabaly the NMS function is compiled improperly, refer to [Issue 5](https://github.com/endernewton/tf-faster-rcnn/issues/5).
 
 ### Train your own model
 1. Download pre-trained models and weights. The current code support VGG16 and Resnet V1 models. Pre-trained models are provided by slim, you can get the pre-trained models [here](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models) and set them in the ``data/imagenet_weights`` folder. For example for VGG16 model, you can set up like:
@@ -151,7 +159,7 @@ If you find it useful, the ``data/cache`` folder created on my side is also shar
   tensorboard --logdir=tensorboard/vgg16/voc_2007_trainval/ --port=7001 &
   tensorboard --logdir=tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ --port=7002 &
   ```
-  
+
 4. Test and evaluate
   ```Shell
   ./experiments/scripts/test_faster_rcnn.sh [GPU_ID] [DATASET] [NET]
@@ -162,7 +170,7 @@ If you find it useful, the ``data/cache`` folder created on my side is also shar
   ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16
   ./experiments/scripts/test_faster_rcnn.sh 1 coco res101
   ```
-  
+
 5. You can use ``tools/reval.sh`` for re-evaluation
 
 
@@ -196,7 +204,7 @@ If you find this implementation or the analysis conducted in our report helpful,
         Journal = {arXiv preprint arXiv:1702.02138},
         Year = {2017}
     }
-    
+
 For convenience, here is the faster RCNN citation:
 
     @inproceedings{renNIPS15fasterrcnn,
@@ -226,7 +234,7 @@ VGG16 COCO 2015 test-dev (900k/1190k):
  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.451
  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591
  ```
- 
+
 VGG16 COCO 2015 test-std (900k/1190k):
  ```
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.295