中文请参考我的CSDN博客
This is a note about how to use tf-faster-rcnn to train your own model on VOC or other dataset.
My machine and library version: GTX 1060, miniconda 4.5.4, CUDA 9.0, CUDNN 7.1.4, tensorflow-gpu 1.8.0.
This step refers to https://github.com/endernewton/tf-faster-rcnn
My version: opencv 3.4.1, cython 0.28.4 and easydict 1.7.
git clone https://github.com/endernewton/tf-faster-rcnn.git
cd tf-faster-rcnn/lib
make clean
make
cd ..
git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI
make
cd ../..
cd data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar
ln -s VOCdevkit VOCdevkit2007
cd ..
Google Driver: link
Baidu Cloud: link
Download the model and put it into the directory tf-faster-rcnn/
, execute the following command.
tar xvf voc_0712_80k-110k.tgz
NET=res101
TRAIN_IMDB=voc_2007_trainval+voc_2012_trainval
mkdir -p output/$NET/$TRAIN_IMDB
cd output/$NET/$TRAIN_IMDB
ln -s ../../../voc_2007_trainval+voc_2012_trainval ./default
cd ../../..
Modifytf-faster-rcnn/lib/datasets/voc_eval.py
, 121 line.
# save
print('Saving cached annotations to {:s}'.format(cachefile))
with open(cachefile, 'w') as f: ---> with open(cachefile, 'wb') as f:
pickle.dump(recs, f)
Run the command after modification.
GPU_ID=0
CUDA_VISIBLE_DEVICES=$GPU_ID ./tools/demo.py
You can use this model to predict your own dataset(category should be included in the VOC dataset). Of course you may not be satisfied with the mean iou(only 0.20~0.60).
mkdir -p data/imagenet_weights
cd data/imagenet_weights
wget -v http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz
tar -xzvf resnet_v1_101_2016_08_28.tar.gz
mv resnet_v1_101.ckpt res101.ckpt
cd ../..
To speed up the training process, I changed the ITERS in the shell file to 300 here.
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE="[50000]"
ITERS=70000 ---> 300
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
ITERS=70000 ---> 300
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
For the test script is automatically executed in the train script, and the test process (4952 images) will take some time. So we only keep the first 200 lines in data/VOCdevkit2007/VOC2007/ImageSets/Main/test.txt
.
./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc res101
Now you can train your own model on the VOC dataset.
This is a dataset for the car detection competition. There are two types of files in this dataset
(see the dataset for details). First we decompress the downloaded dataset and put it into a new directory tf-faster-rcnn/dataset
.
Run the following command in the directory tf-faster-rcnn/
.
rm -rf data/VOCdevkit2007/VOC2007/*
The directory stores the Annotations file (image_name.xml) for all images in the training set.
Here is a notebook to display how to get the .xml files.
- train.txt: all image names in the training set (without .jpg suffix).
- trainval.txt: all image names in the traing set and validation set.
- val.txt: all image names in the validation set.
- test.txt: all image names in the test set(the test script evaluates the mean iou based on it. So you should use all image names of the validation set if test set doesn't have xml files).
Note: test.txt and val.txt are the same (no .xml file in the test set), simply copy val.txt and rename it.
All image files (Also you can't put it in that directory if your test set does't have xml files).
Here is a notebook to display how to get .txt and .jpg files.
Move Annotations
, ImageSets/Main
, JPEGImages
under dataset/
to the directory data/VOCdevkit2007/VOC2007/
. After that, your dataset is ready.
- Modify the classes, 36 line
self._classes = ('__background__', # always index 0 'car')
- Remove the "- 1" operation if your dataset is 0-based, 169 line
x1 = float(bbox.find('xmin').text) y1 = float(bbox.find('ymin').text) x2 = float(bbox.find('xmax').text) y2 = float(bbox.find('ymax').text)
Modify the program, 105 - 124 line.
def _get_widths(self):
return [PIL.Image.open(self.image_path_at(i)).size[0]
for i in range(self.num_images)]
def _get_heights(self):
return [PIL.Image.open(self.image_path_at(i)).size[1]
for i in range(self.num_images)]
def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
heights = self._get_heights()
for i in range(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
for ids in range(len(boxes)):
if boxes[ids][2] < boxes[ids][0]:
boxes[ids][0] = 0
assert (boxes[:, 2] >= boxes[:, 0]).all()
entry = {'boxes': boxes,
'gt_overlaps': self.roidb[i]['gt_overlaps'],
'gt_classes': self.roidb[i]['gt_classes'],
'flipped': True}
self.roidb.append(entry)
self._image_index = self._image_index * 2
27 - 30 line, int() ---> float().
obj_struct['bbox'] = [float(bbox.find('xmin').text),
float(bbox.find('ymin').text),
float(bbox.find('xmax').text),
float(bbox.find('ymax').text)]
- Remove old model
rm -rf output
- Clear cache
rm data/cache/voc_2007_test_gt_roidb.pkl rm data/cache/voc_2007_trainval_gt_roidb.pkl rm data/VOCdevkit2007/VOC2007/ImageSets/Main/test.txt_annots.pkl
- Update parameters
3.1 Modify stepsize and iteations inexperiments/scripts/train_faster_rcnn.sh
.
3.2 Modify the parameters of the model inexperiments/cfgs
.
3.3 Modify the training and test parameters inlib/model/config.py
. - Training
./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc res101
Note: Make sure that the ITERS in train_faster_rcnn.sh
and test_faster_rcnn.sh
are same.
Use ./tools/demo.py
to predict your test images. This is an demo example on my github. A simple explanation of the input:
- demo_net: classification network architecture.
- demo_ite: ITERS of the network.
- demo_dir: The test set directory.
- demo_vis: Whether to visualize the test image.
- write_csv: Whether to write the predicted boxes to the csv file.
- dataset: Select the dataset format.
Note: In order to observe the prediction performance better, you can first put a few test images in data/demo
, and set demo_vis to True when executing. If you are satisfied with the visualization results, predict all the test images after modify demo_dir
and set demo_vis
to False (** Visualizing many images at the same time is terrible, remember!!!**).
It is critical to understand all the procedures in this project. Only then can you train your network by modifying parameters, adding data augmentation and choosing different metrics.