EfficientDet: Scalable and Efficient Object Detection

Github

https://github.com/google/automl/tree/master/efficientdet


https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch


CVPR2020,经典,必读,里程碑式作品。


论文主要提出了一个快速,精度高的目标检测框架EfficientDet。主要的改进创新点包括,BiFPN,复合scale策略。最终可以在coco测试集上达到52.2Map,并且只有52M大小。比之前最好的模型小4-9倍,小13-42倍的Flops计算量。



BiFPNbi-directional feature pyramid network):



(a)传统的FPN结构,只有顶层向底层的特征融合。


(b)PANet,融合了顶层向底层和底层向顶层的双向特征融合,参数量较大。


(c)NAS-FPN,通过NAS方式搜索的FPN结构,NAS的方式决定了需要连接的线路,相比PANet具有较少的参数量,但是精度比PANet略低。


(d)本文的BiFPN,融合了顶层到底层和底层到底层的特征,并且进行了多次的堆叠。并且融合后的特征的基础上还进行了原始对应层的特征的叠加。


BiFPN的设计策略:


  1. 如果一个节点op只有一个边的特征输入,并且没有特征融合,那么这个节点对最终的网络贡献就会比较低。


if a node has only one input edge with no feature fusion, then it will have less contribution to feature network that aims at fusing different features. This leads to a simplified bidirectional network;


     2.增加了对应层的原始特征到最终该层特征输出,以此来通过最小计算代价,获得最大的特征融合。



we add an extra edge from the original input to output node if they are at the same level, in order to fuse more features without adding much cost;


     3.不像PANet那样只有一次的顶层向底层和底层向顶层的特征融合,BiFPN进行了多次的这样的特征融合。



unlike PANet [23] that only has one top-down and one bottom-up path, we treat each bidirectional (top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to enable more high-level feature fusion.


加权特征融合(Weighted Feature Fusion):


不同的输入特征具有不同的输入分辨率,并且对最终的输出特征的贡献也不同。因此,本文提出了对不同的输入特征增加不同的权重,来通过网络实现对不同输入特征的权重平衡。



since different input features are at different resolutions, they usually contribute to the output feature unequally. To address this issue, we propose to add an additional weight for each input, and let the network to learn the importance of each input feature.


加权重的方式有3种,


  1. 无边界的融合(Unbounded fusion


如果随便加的一个权重,那么这样的权重是无边界的,会使得训练过程不稳定。因此,通过对权重进行归一化,对所加的权重的边界进行截断,从而保证训练的稳定性。



since the scalar weight is unbounded, it could potentially cause training instability. Therefore, we resort to weight normalization to bound the value range of each weight


     2.基于softmax方式的融合(Softmax-based fusion


这种方式天然的保证了所加的权重是0-1之间。缺点就是softmax的计算量相对较大。



     3.基于快速的融合(Fast normalized fusion



这种方式是基于softmax方式的改进版,通过对wi进行RELU操作,来保证wi>0。


e = 0.0001,是一个非常小的数。


由于没有softmax的指数计算,所以会更加高效,对softmax形式提速30%。取值范围也是0-1


 


网络结构:



网络采用了EfficientNet作为基础结构。在{ P3; P4; P5; P6; P7}5层分别引出分支,进行特征融合和结果预测。这引出的5个分支,进行了多次的BiFPN操作。


 


复合尺度策略(Compound Scaling):


EfficientDet通过一个参数φ,同时实现了对于网络宽度,网络长度,输入大小,输出大小的调整。



uses a simple compound coefficient φ to jointly scale up all dimensions of backbone network, BiFPN network, class/box network, and resolution.


论文通过网格搜索,在候选尺度因子{1.2, 1.25, 1.3, 1.35, 1.4, 1.45}中,选择了1.35作为最佳的尺度因子。


其中,网络宽度,深度的调整策略,



网络输出(类别,边框)的调整策略,



网络输入的调整策略,



 


实验结果:



分割任务的结果:



 


测试代码:


  1. import time
  2. import torch
  3. from torch.backends import cudnn
  4. from backbone import EfficientDetBackbone
  5. import cv2
  6. import os
  7. import numpy as np
  8. from efficientdet.utils import BBoxTransform, ClipBoxes
  9. from utils.utils import preprocess, preprocess_, invert_affine, postprocess
  10. class <span class=”hljs-title class_“>EFFICIENTDET():
  11. def init(self,):
  12. compound_coef = 0
  13. force_input_size = None # set None to use default size
  14. # replace this part with your project’s anchor config
  15. anchor_ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]
  16. anchor_scales = [2 0, 2 (1.0 / 3.0), 2 ** (2.0 / 3.0)]
  17. self.threshold = 0.2
  18. self.iou_threshold = 0.2
  19. self.use_cuda = True
  20. self.use_float16 = False
  21. cudnn.fastest = True
  22. cudnn.benchmark = True
  23. self.obj_list = [‘cz’,‘cw’,‘zz’,‘zw’,‘dz’,‘dw’,‘zq’]
  24. # tf bilinear interpolation is different from any other’s, just make do
  25. input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
  26. self.input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
  27. self.model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(self.obj_list),
  28. ratios=anchor_ratios, scales=anchor_scales)
  29. self.model.load_state_dict(torch.load(‘logs/style_dataset/efficientdet-d0_82_6321.pth’))
  30. self.model.requires_grad_(False)
  31. self.model.eval()
  32. if self.use_cuda:
  33. self.model = self.model.cuda()
  34. if self.use_float16:
  35. self.model = self.model.half()
  36. def <span class=”hljs-title function_“>display(self,preds, imgs, name ,imshow=True, imwrite=False):
  37. for i in range(len(imgs)):
  38. if len(preds[i][‘rois’]) == 0:
  39. continue
  40. for j in range(len(preds[i][‘rois’])):
  41. (x1, y1, x2, y2) = preds[i][‘rois’][j].astype(np.int)
  42. cv2.rectangle(imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
  43. obj = self.obj_list[preds[i][‘class_ids’][j]]
  44. score = float(preds[i][‘scores’][j])
  45. cv2.putText(imgs[i], ‘{}, {:.3f}’.format(obj, score),
  46. (x1, y1 + 30), cv2.FONT_HERSHEY_SIMPLEX, 1.5,
  47. (0, 0, 255), 2)
  48. if imshow:
  49. cv2.imshow(‘img’, imgs[i])
  50. cv2.waitKey(0)
  51. if imwrite:
  52. cv2.imwrite(f’meina/‘+name, imgs[i])
  53. def <span class=”hljs-title function_“>process(self,image ,image_name=“default.jpg”):
  54. ori_imgs, framed_imgs, framed_metas = preprocess_([image], max_size=self.input_size)
  55. if self.use_cuda:
  56. x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
  57. else:
  58. x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
  59. x = x.to(torch.float32 if not self.use_float16 else torch.float16).permute(0, 3, 1, 2)
  60. t1 = time.time()
  61. with torch.no_grad():
  62. features, regression, classification, anchors = self.model(x)
  63. regressBoxes = BBoxTransform()
  64. clipBoxes = ClipBoxes()
  65. out = postprocess(x,
  66. anchors, regression, classification,
  67. regressBoxes, clipBoxes,
  68. self.threshold, self.iou_threshold)
  69. out = invert_affine(framed_metas, out)
  70. self.display(out, ori_imgs, image_name ,imshow=False, imwrite=True)
  71. t2 = time.time()
  72. tact_time = (t2 - t1)
  73. print(f’{tact_time} seconds, {1 / tact_time} FPS, @batch_size 1’)
  74. def <span class=”hljs-title function_“>test_images():
  75. efficientDet = EFFICIENTDET()
  76. #ipath = ‘datasets/style_dataset/val2020/‘
  77. ipath = ‘datasets/give_/‘
  78. for name in os.listdir(ipath):
  79. img_path = os.path.join(ipath, name)
  80. image =cv2.imread(img_path,1)
  81. efficientDet.process(image, name)
  82. if name==“__main__“:
  83. test_images()

总结:


  1. EfficientDet既可以用于检测任务,也可以用于分割任务。
  2. 双向的特征融合策略,BiFPN
  3. 组合式的尺度策略,compound scaling
  4. 速度和精度都很高的检测框架
  5. 训练速度挺快,1个epoch就可以loss下降为1。