forked from d2l-ai/d2l-zh
-
Notifications
You must be signed in to change notification settings - Fork 0
/
yolo.md.bak
447 lines (382 loc) · 17.1 KB
/
yolo.md.bak
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
# 目标检测模型:YOLO
接着上一讲的SSD,我们继续来实现一个目标检测的经典算法--YOLO2。YOLO2是继YOLO之后使用纯卷积网络的作品。
这里不再赘述数据集之类的琐事,直接进入算法实现的部分,推荐不熟悉的小伙伴先用SSD的例子热身一下。
```{.python .input n=1}
import numpy as np
import mxnet as mx
from mxnet import gluon
from mxnet import nd
from mxnet.gluon import nn
from mxnet.gluon import Block, HybridBlock
from mxnet.gluon.model_zoo import vision
```
## YOLO v2
### 原始卷积输出的转换
我们知道原始卷积输出的是一个 (B, N, H, W)的矩阵,其中B是batch-size, H和W是特征层的空间维度,N是卷积的输出通道数,相对比较复杂,是我们关注的维度。
简单地说,我们需要对每个空间维度上的点(x, y)输出(类别数 + 1 + 4)个预测值。类别数很好理解,就是训练集里正类的数量, 4也很好理解,就是每个预测框的偏移量预测(x, y, w, h)
#### 中心点的转换
中心点是sigmoid函数的输出值,本身在0到1之间,表示的意义是每个格点内部的空间位置,我们需要把它们转换成图片上的相对位置。 通过arange函数生成连续递增的数列,加上预测值,就是每个目标中心在图片上的相对坐标。
```{.python .input}
def transform_center(xy):
"""Given x, y prediction after sigmoid(), convert to relative coordinates (0, 1) on image."""
b, h, w, n, s = xy.shape
offset_y = nd.tile(nd.arange(0, h, repeat=(w * n * 1), ctx=xy.context).reshape((1, h, w, n, 1)), (b, 1, 1, 1, 1))
# print(offset_y[0].asnumpy()[:, :, 0, 0])
offset_x = nd.tile(nd.arange(0, w, repeat=(n * 1), ctx=xy.context).reshape((1, 1, w, n, 1)), (b, h, 1, 1, 1))
# print(offset_x[0].asnumpy()[:, :, 0, 0])
x, y = xy.split(num_outputs=2, axis=-1)
x = (x + offset_x) / w
y = (y + offset_y) / h
return x, y
```
#### 长宽的转换
长宽是exp函数的输出,意义是相对于锚点长宽的比率,我们对预测值的exp()乘以相对锚点的长宽,除以图片格点的数量,得到的是目标长宽相对于图片的尺寸。
```{.python .input}
def transform_size(wh, anchors):
"""Given w, h prediction after exp() and anchor sizes, convert to relative width/height (0, 1) on image"""
b, h, w, n, s = wh.shape
aw, ah = nd.tile(nd.array(anchors, ctx=wh.context).reshape((1, 1, 1, -1, 2)), (b, h, w, 1, 1)).split(num_outputs=2, axis=-1)
w_pred, h_pred = nd.exp(wh).split(num_outputs=2, axis=-1)
w_out = w_pred * aw / w
h_out = h_pred * ah / h
return w_out, h_out
```
#### yolo2_forward作为一个方便使用的函数,会把卷积的通道分开,转换,最后转成我们需要的检测框
```{.python .input n=2}
def yolo2_forward(x, num_class, anchor_scales):
"""Transpose/reshape/organize convolution outputs."""
stride = num_class + 5
# transpose and reshape, 4th dim is the number of anchors
x = x.transpose((0, 2, 3, 1))
x = x.reshape((0, 0, 0, -1, stride))
# now x is (batch, m, n, stride), stride = num_class + 1(object score) + 4(coordinates)
# class probs
cls_pred = x.slice_axis(begin=0, end=num_class, axis=-1)
# object score
score_pred = x.slice_axis(begin=num_class, end=num_class + 1, axis=-1)
score = nd.sigmoid(score_pred)
# center prediction, in range(0, 1) for each grid
xy_pred = x.slice_axis(begin=num_class + 1, end=num_class + 3, axis=-1)
xy = nd.sigmoid(xy_pred)
# width/height prediction
wh = x.slice_axis(begin=num_class + 3, end=num_class + 5, axis=-1)
# convert x, y to positions relative to image
x, y = transform_center(xy)
# convert w, h to width/height relative to image
w, h = transform_size(wh, anchor_scales)
# cid is the argmax channel
cid = nd.argmax(cls_pred, axis=-1, keepdims=True)
# convert to corner format boxes
half_w = w / 2
half_h = h / 2
left = nd.clip(x - half_w, 0, 1)
top = nd.clip(y - half_h, 0, 1)
right = nd.clip(x + half_w, 0, 1)
bottom = nd.clip(y + half_h, 0, 1)
output = nd.concat(*[cid, score, left, top, right, bottom], dim=4)
return output, cls_pred, score, nd.concat(*[xy, wh], dim=4)
```
### 定义一个函数来生成yolo2训练目标
YOLO2寻找真实目标的方法比较特殊,是在每个格点内各自比较,而不是使用全局的预设。而且我们不需要对生成的训练目标进行反向传播,为了简洁描述比较的方法,我们可以在这里转成numpy而且可以用for循环(切记转成numpy会破坏自动求导的记录,只有当反向传播不需要的时候才能使用这个技巧),实际使用中,如果遇到速度问题,我们可以用mx.ndarray矩阵的写法来加速。
这里我们使用了一个技巧:sample_weight(个体权重)矩阵, 用于损失函数内部权重的调整,我们也可以通过权重矩阵来控制哪些个体需要被屏蔽,这一点在目标检测中尤其重要,因为往往大多数的背景区域不需要预测检测框。
```{.python .input n=3}
def corner2center(boxes, concat=True):
"""Convert left/top/right/bottom style boxes into x/y/w/h format"""
left, top, right, bottom = boxes.split(axis=-1, num_outputs=4)
x = (left + right) / 2
y = (top + bottom) / 2
width = right - left
height = bottom - top
if concat:
last_dim = len(x.shape) - 1
return nd.concat(*[x, y, width, height], dim=last_dim)
return x, y, width, height
def center2corner(boxes, concat=True):
"""Convert x/y/w/h style boxes into left/top/right/bottom format"""
x, y, w, h = boxes.split(axis=-1, num_outputs=4)
w2 = w / 2
h2 = h / 2
left = x - w2
top = y - h2
right = x + w2
bottom = y + h2
if concat:
last_dim = len(left.shape) - 1
return nd.concat(*[left, top, right, bottom], dim=last_dim)
return left, top, right, bottom
def yolo2_target(scores, boxes, labels, anchors, ignore_label=-1, thresh=0.5):
"""Generate training targets given predictions and labels."""
b, h, w, n, _ = scores.shape
anchors = np.reshape(np.array(anchors), (-1, 2))
#scores = nd.slice_axis(outputs, begin=1, end=2, axis=-1)
#boxes = nd.slice_axis(outputs, begin=2, end=6, axis=-1)
gt_boxes = nd.slice_axis(labels, begin=1, end=5, axis=-1)
target_score = nd.zeros((b, h, w, n, 1), ctx=scores.context)
target_id = nd.ones_like(target_score, ctx=scores.context) * ignore_label
target_box = nd.zeros((b, h, w, n, 4), ctx=scores.context)
sample_weight = nd.zeros((b, h, w, n, 1), ctx=scores.context)
for b in range(output.shape[0]):
# find the best match for each ground-truth
label = labels[b].asnumpy()
valid_label = label[np.where(label[:, 0] > -0.5)[0], :]
# shuffle because multi gt could possibly match to one anchor, we keep the last match randomly
np.random.shuffle(valid_label)
for l in valid_label:
gx, gy, gw, gh = (l[1] + l[3]) / 2, (l[2] + l[4]) / 2, l[3] - l[1], l[4] - l[2]
ind_x = int(gx * w)
ind_y = int(gy * h)
tx = gx * w - ind_x
ty = gy * h - ind_y
gw = gw * w
gh = gh * h
# find the best match using width and height only, assuming centers are identical
intersect = np.minimum(anchors[:, 0], gw) * np.minimum(anchors[:, 1], gh)
ovps = intersect / (gw * gh + anchors[:, 0] * anchors[:, 1] - intersect)
best_match = int(np.argmax(ovps))
target_id[b, ind_y, ind_x, best_match, :] = l[0]
target_score[b, ind_y, ind_x, best_match, :] = 1.0
tw = np.log(gw / anchors[best_match, 0])
th = np.log(gh / anchors[best_match, 1])
target_box[b, ind_y, ind_x, best_match, :] = mx.nd.array([tx, ty, tw, th])
sample_weight[b, ind_y, ind_x, best_match, :] = 1.0
# print('ind_y', ind_y, 'ind_x', ind_x, 'best_match', best_match, 't', tx, ty, tw, th, 'ovp', ovps[best_match], 'gt', gx, gy, gw/w, gh/h, 'anchor', anchors[best_match, 0], anchors[best_match, 1])
return target_id, target_score, target_box, sample_weight
```
### 我们用YOLO2Output作为yolo2的输出层,其实本质就是一个HybridBlock,内部包了一个卷积层作为最终的输出
```{.python .input n=4}
class YOLO2Output(HybridBlock):
def __init__(self, num_class, anchor_scales, **kwargs):
super(YOLO2Output, self).__init__(**kwargs)
assert num_class > 0, "number of classes should > 0, given {}".format(num_class)
self._num_class = num_class
assert isinstance(anchor_scales, (list, tuple)), "list or tuple of anchor scales required"
assert len(anchor_scales) > 0, "at least one anchor scale required"
for anchor in anchor_scales:
assert len(anchor) == 2, "expected each anchor scale to be (width, height), provided {}".format(anchor)
self._anchor_scales = anchor_scales
out_channels = len(anchor_scales) * (num_class + 1 + 4)
with self.name_scope():
self.output = nn.Conv2D(out_channels, 1, 1)
def hybrid_forward(self, F, x, *args):
return self.output(x)
```
### 接下来是下载并加载数据集
```{.python .input n=5}
from mxnet import gluon
root_url = ('https://apache-mxnet.s3-accelerate.amazonaws.com/'
'gluon/dataset/pikachu/')
data_dir = '../data/pikachu/'
dataset = {'train.rec': 'e6bcb6ffba1ac04ff8a9b1115e650af56ee969c8',
'train.idx': 'dcf7318b2602c06428b9988470c731621716c393',
'val.rec': 'd6c33f799b4d058e82f2cb5bd9a976f69d72d520'}
for k, v in dataset.items():
gluon.utils.download(root_url+k, data_dir+k, sha1_hash=v)
```
```{.python .input n=6}
from mxnet import image
from mxnet import nd
data_shape = 256
batch_size = 32
rgb_mean = nd.array([123, 117, 104])
rgb_std = nd.array([58.395, 57.12, 57.375])
def get_iterators(data_shape, batch_size):
class_names = ['pikachu', 'dummy']
num_class = len(class_names)
train_iter = image.ImageDetIter(
batch_size=batch_size,
data_shape=(3, data_shape, data_shape),
path_imgrec=data_dir+'train.rec',
path_imgidx=data_dir+'train.idx',
shuffle=True,
mean=True,
std=True,
rand_crop=1,
min_object_covered=0.95,
max_attempts=200)
val_iter = image.ImageDetIter(
batch_size=batch_size,
data_shape=(3, data_shape, data_shape),
path_imgrec=data_dir+'val.rec',
shuffle=False,
mean=True,
std=True)
return train_iter, val_iter, class_names, num_class
train_data, test_data, class_names, num_class = get_iterators(
data_shape, batch_size)
```
```{.python .input n=7}
batch = train_data.next()
print(batch)
```
```{.python .input n=8}
%matplotlib inline
import matplotlib as mpl
mpl.rcParams['figure.dpi']= 120
from matplotlib import pyplot as plt
def box_to_rect(box, color, linewidth=3):
"""convert an anchor box to a matplotlib rectangle"""
box = box.asnumpy()
return plt.Rectangle(
(box[0], box[1]), box[2]-box[0], box[3]-box[1],
fill=False, edgecolor=color, linewidth=linewidth)
_, figs = plt.subplots(3, 3, figsize=(6,6))
for i in range(3):
for j in range(3):
img, labels = batch.data[0][3*i+j], batch.label[0][3*i+j]
img = img.transpose((1, 2, 0)) * rgb_std + rgb_mean
img = img.clip(0,255).asnumpy()/255
fig = figs[i][j]
fig.imshow(img)
for label in labels:
rect = box_to_rect(label[1:5]*data_shape,'red',2)
fig.add_patch(rect)
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.show()
```
### 损失函数
```{.python .input n=9}
sce_loss = gluon.loss.SoftmaxCrossEntropyLoss(from_logits=False)
l1_loss = gluon.loss.L1Loss()
```
#### 评估测量
这里我们取巧用一个自己定义的metric来记录纯损失值,有的时候当你想不出特别贴切的观测函数,不如直接看损失值有没有下降
```{.python .input n=10}
from mxnet import metric
class LossRecorder(mx.metric.EvalMetric):
"""LossRecorder is used to record raw loss so we can observe loss directly
"""
def __init__(self, name):
super(LossRecorder, self).__init__(name)
def update(self, labels, preds=0):
"""Update metric with pure loss
"""
for loss in labels:
if isinstance(loss, mx.nd.NDArray):
loss = loss.asnumpy()
self.sum_metric += loss.sum()
self.num_inst += 1
obj_loss = LossRecorder('objectness_loss')
cls_loss = LossRecorder('classification_loss')
box_loss = LossRecorder('box_refine_loss')
```
#### 粗粒度调控下每种损失相对的权重
```{.python .input}
positive_weight = 5.0
negative_weight = 0.1
class_weight = 1.0
box_weight = 5.0
```
### 网络
加载模型园里训练好的resnet网络,取出中间的特征提取层,用合适的锚点尺寸新建我们的YOLO2输出层
```{.python .input}
pretrained = vision.get_model('resnet18_v1', pretrained=True).features
net = nn.HybridSequential()
for i in range(len(pretrained) - 2):
net.add(pretrained[i])
# anchor scales, try adjust it yourself
scales = [[3.3004, 3.59034],
[9.84923, 8.23783]]
# use 2 classes, 1 as dummy class, otherwise softmax won't work
predictor = YOLO2Output(2, scales)
predictor.initialize()
net.add(predictor)
```
### 这里我们还是需要GPU来加速训练
```{.python .input n=12}
from mxnet import init
from mxnet import gpu
ctx = gpu(0)
net.collect_params().reset_ctx(ctx)
trainer = gluon.Trainer(net.collect_params(),
'sgd', {'learning_rate': 1, 'wd': 5e-4})
```
```{.python .input n=13}
import time
from mxnet import autograd
for epoch in range(20):
# reset data iterators and metrics
train_data.reset()
cls_loss.reset()
obj_loss.reset()
box_loss.reset()
tic = time.time()
for i, batch in enumerate(train_data):
x = batch.data[0].as_in_context(ctx)
y = batch.label[0].as_in_context(ctx)
with autograd.record():
x = net(x)
output, cls_pred, score, xywh = yolo2_forward(x, 2, scales)
with autograd.pause():
tid, tscore, tbox, sample_weight = yolo2_target(score, xywh, y, scales, thresh=0.5)
# losses
loss1 = sce_loss(cls_pred, tid, sample_weight * class_weight)
score_weight = nd.where(sample_weight > 0,
nd.ones_like(sample_weight) * positive_weight,
nd.ones_like(sample_weight) * negative_weight)
loss2 = l1_loss(score, tscore, score_weight)
loss3 = l1_loss(xywh, tbox, sample_weight * box_weight)
loss = loss1 + loss2 + loss3
loss.backward()
trainer.step(batch_size)
# update metrics
cls_loss.update(loss1)
obj_loss.update(loss2)
box_loss.update(loss3)
print('Epoch %2d, train %s %.5f, %s %.5f, %s %.5f time %.1f sec' % (
epoch, *cls_loss.get(), *obj_loss.get(), *box_loss.get(), time.time()-tic))
```
### 预处理和预测函数
```{.python .input n=14}
def process_image(fname):
with open(fname, 'rb') as f:
im = image.imdecode(f.read())
# resize to data_shape
data = image.imresize(im, data_shape, data_shape)
# minus rgb mean, divide std
data = (data.astype('float32') - rgb_mean) / rgb_std
# convert to batch x channel x height xwidth
return data.transpose((2,0,1)).expand_dims(axis=0), im
def predict(x):
x = net(x)
output, cls_prob, score, xywh = yolo2_forward(x, 2, scales)
return nd.contrib.box_nms(output.reshape((0, -1, 6)))
```
### 继续读取皮卡丘来做测试
```{.python .input n=18}
x, im = process_image('../img/pikachu.jpg')
out = predict(x.as_in_context(ctx))
out.shape
out
```
### 显示结果
```{.python .input n=17}
mpl.rcParams['figure.figsize'] = (6,6)
colors = ['blue', 'green', 'red', 'black', 'magenta']
def display(im, out, threshold=0.5):
plt.imshow(im.asnumpy())
for row in out:
row = row.asnumpy()
class_id, score = int(row[0]), row[1]
if class_id < 0 or score < threshold:
continue
color = colors[class_id%len(colors)]
box = row[2:6] * np.array([im.shape[0],im.shape[1]]*2)
rect = box_to_rect(nd.array(box), color, 2)
plt.gca().add_patch(rect)
text = class_names[class_id]
plt.gca().text(box[0], box[1],
'{:s} {:.2f}'.format(text, score),
bbox=dict(facecolor=color, alpha=0.5),
fontsize=10, color='white')
plt.show()
display(im, out[0], threshold=0.5)
```
## 小结
* TODO(@mli)
## 练习
* 试试改变默认的anchor scale,调整尺寸,增加或减少数量?
* 调整不同损失函数相对的权重,看看对于训练结果有什么影响?
* 在目标检测这种batch内部有效目标占比很少的情况下,损失函数内部取平均有什么问题,更好的方法?
## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/6279)
![](../img/qr_yolo.svg)