深入了解Caffe-ssd网络

从prototxt文件抠Caffe-ssd目标检测网络

SSD网络结构

"da"

上图为论文中的SSD300网络结构,前面的基础网络为VGG16的一部分,后面新添加了若干卷积层。利用conv4-3,conv-7(FC7),conv8-2,conv9-2,conv10_2,conv11_2这些不同的feature maps生成不同大小、不同宽高比的Defalut Box(Prior Box),在多个feature maps上同时进行softmax分类和位置回归.

Prior Box的产生与使用

以conv4_3层为例子.conv4_3层产生Prior Box,进行分类和回归的示意图如下

"da"

prototxt文件中与conv4_3相关的层的信息如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
layer {
name: "conv4_3_norm" //正则化
type: "Normalize"
bottom: "conv4_3"
top: "conv4_3_norm"
norm_param {
across_spatial: false
scale_filler {
type: "constant"
value: 20
}
channel_shared: false
}
}
//-------------------------------------------------------------------------------------
//通过一次卷积,生成[1, 4*num_priorbox, layer_height, layer_width]大小的feature map用于bbox regression,
//即conv4_3上每个点对应一组坐标偏移值[dxmin,dymin,dxmax,dymax]
//-------------------------------------------------------------------------------------
layer {
name: "conv4_3_norm_mbox_loc"
type: "Convolution"
bottom: "conv4_3_norm"
top: "conv4_3_norm_mbox_loc"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16 //4*4
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
//-------------------------------------------------------------------------------------
//Permute是SSD中自带的层,上面conv4_3_norm_mbox_conf_perm的的定义。
//Permute相当于交换caffe blob中的数据维度。
//在正常情况下caffe blob的顺序为:
//bottom blob = [batch_num, channel, height, width]
//经过conv4_3_norm_mbox_conf_perm后的caffe blob为:
//top blob = [batch_num, height, width, channel]
//-------------------------------------------------------------------------------------
layer {
name: "conv4_3_norm_mbox_loc_perm"
type: "Permute"
bottom: "conv4_3_norm_mbox_loc"
top: "conv4_3_norm_mbox_loc_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
//-------------------------------------------------------------------------------------
//Flatten的作用:n*c*h*w -> n*(c*h*w),便于后续各层的拼接
//-------------------------------------------------------------------------------------
layer {
name: "conv4_3_norm_mbox_loc_flat"
type: "Flatten"
bottom: "conv4_3_norm_mbox_loc_perm"
top: "conv4_3_norm_mbox_loc_flat"
flatten_param {
axis: 1
}
}
//-------------------------------------------------------------------------------------
//通过一次卷积,生成[1, num_class*num_priorbox, layer_height, layer_width]大小的feature map
//用于softmax分类目标和非目标(其中num_class是目标类别,SSD 300中num_class = 21)
//-------------------------------------------------------------------------------------
layer {
name: "conv4_3_norm_mbox_conf"
type: "Convolution"
bottom: "conv4_3_norm"
top: "conv4_3_norm_mbox_conf"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 84 //21*4
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_3_norm_mbox_conf_perm"
type: "Permute"
bottom: "conv4_3_norm_mbox_conf"
top: "conv4_3_norm_mbox_conf_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "conv4_3_norm_mbox_conf_flat"
type: "Flatten"
bottom: "conv4_3_norm_mbox_conf_perm"
top: "conv4_3_norm_mbox_conf_flat"
flatten_param {
axis: 1
}
}
//-------------------------------------------------------------------------------------
//生成了[1, 2, 4*num_priorbox]大小的prior box blob,
//其中2个channel分别存储prior box的4个点坐标和对应的4个variance
//variance指的是bounding regression中的权重
//-------------------------------------------------------------------------------------
layer {
name: "conv4_3_norm_mbox_priorbox"
type: "PriorBox"
bottom: "conv4_3_norm"
bottom: "data"
top: "conv4_3_norm_mbox_priorbox"
prior_box_param {
min_size: 30.0
max_size: 60.0
aspect_ratio: 2
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
step: 8
offset: 0.5
}
}

在上图线路(2)中,网络输出[dxmin,dymin,dxmax,dymax],即对应下面代码中bbox;然后利用如下方法进行针对prior box的位置回归

1
2
3
4
5
//box_utils.cpp void DecodeBBox()函数
decode_bbox->set_xmin(prior_bbox.xmin() + prior_variance[0] * bbox.xmin() * prior_width);
decode_bbox->set_ymin(prior_bbox.ymin() + prior_variance[1] * bbox.ymin() * prior_height);
decode_bbox->set_xmax(prior_bbox.xmax() + prior_variance[2] * bbox.xmax() * prior_width);
decode_bbox->set_ymax(prior_bbox.ymax() + prior_variance[3] * bbox.ymax() * prior_height);

训练与损失计算

匹配方法

在训练时,groundtruth boxes与prior boxes按照如下方式进行配对:
1.首先,寻找与每一个ground truth box有最大的IoU的default box,这样就能保证每一个groundtruth box与唯一的一个default box对应起来.
2.SSD之后又将剩余还没有配对的default box与任意一个groundtruth box尝试配对,只要两者之间的jaccard overlap大于阈值,就认为match(SSD 300阈值为0.5)。
显然配对到GT的default box就是positive,没有配对到GT的default box就是negative。

损失函数

SSD loss分为confidence loss和location loss两部分,其中N是match到GT(Ground Truth)的prior box数量;而α参数用于调整confidence loss和location loss之间的比例,默认α=1.

"da"

"da"

实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
layer {
name: "mbox_loc" //拼接
type: "Concat"
bottom: "conv4_3_norm_mbox_loc_flat"
bottom: "fc7_mbox_loc_flat"
bottom: "conv6_2_mbox_loc_flat"
bottom: "conv7_2_mbox_loc_flat"
bottom: "conv8_2_mbox_loc_flat"
bottom: "conv9_2_mbox_loc_flat"
top: "mbox_loc"
concat_param {
axis: 1
}
}
layer {
name: "mbox_conf"//拼接
type: "Concat"
bottom: "conv4_3_norm_mbox_conf_flat"
bottom: "fc7_mbox_conf_flat"
bottom: "conv6_2_mbox_conf_flat"
bottom: "conv7_2_mbox_conf_flat"
bottom: "conv8_2_mbox_conf_flat"
bottom: "conv9_2_mbox_conf_flat"
top: "mbox_conf"
concat_param {
axis: 1
}
}
layer {
name: "mbox_priorbox"//拼接
type: "Concat"
bottom: "conv4_3_norm_mbox_priorbox"
bottom: "fc7_mbox_priorbox"
bottom: "conv6_2_mbox_priorbox"
bottom: "conv7_2_mbox_priorbox"
bottom: "conv8_2_mbox_priorbox"
bottom: "conv9_2_mbox_priorbox"
top: "mbox_priorbox"
concat_param {
axis: 2
}
}
layer {
name: "mbox_loss"//计算损失
type: "MultiBoxLoss"
bottom: "mbox_loc"
bottom: "mbox_conf"
bottom: "mbox_priorbox"
bottom: "label"
top: "mbox_loss"
include {
phase: TRAIN
}
propagate_down: true
propagate_down: true
propagate_down: false
propagate_down: false
loss_param {
normalization: VALID
}
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 1.0
num_classes: 5
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.5
use_prior_for_matching: true
background_label_id: 0
use_difficult_gt: true
neg_pos_ratio: 3.0 //一般情况下负样本数量远大于正样本,会导致网络过度重视负样本,从而loss不稳定。
//(Hard negative mining)平衡正负样本的比例,根据score排序bbox,选择score高的bbox训练,保证pos:neg=1:3.
neg_overlap: 0.5
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: MAX_NEGATIVE
}
}

参考文章
【1】CNN目标检测(三):SSD详解
【2】SSD paper