EfficientDet: Scalable and Efficient Object Detection






BiFPNbi-directional feature pyramid network):






  1. 如果一个节点op只有一个边的特征输入,并且没有特征融合,那么这个节点对最终的网络贡献就会比较低。

if a node has only one input edge with no feature fusion, then it will have less contribution to feature network that aims at fusing different features. This leads to a simplified bidirectional network;


we add an extra edge from the original input to output node if they are at the same level, in order to fuse more features without adding much cost;


unlike PANet [23] that only has one top-down and one bottom-up path, we treat each bidirectional (top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to enable more high-level feature fusion.

加权特征融合(Weighted Feature Fusion):


since different input features are at different resolutions, they usually contribute to the output feature unequally. To address this issue, we propose to add an additional weight for each input, and let the network to learn the importance of each input feature.


  1. 无边界的融合(Unbounded fusion


since the scalar weight is unbounded, it could potentially cause training instability. Therefore, we resort to weight normalization to bound the value range of each weight

     2.基于softmax方式的融合(Softmax-based fusion


     3.基于快速的融合(Fast normalized fusion


e = 0.0001,是一个非常小的数。




网络采用了EfficientNet作为基础结构。在{ P3; P4; P5; P6; P7}5层分别引出分支,进行特征融合和结果预测。这引出的5个分支,进行了多次的BiFPN操作。


复合尺度策略(Compound Scaling):


uses a simple compound coefficient φ to jointly scale up all dimensions of backbone network, BiFPN network, class/box network, and resolution.

论文通过网格搜索,在候选尺度因子{1.2, 1.25, 1.3, 1.35, 1.4, 1.45}中,选择了1.35作为最佳的尺度因子。









  1. import time
  2. import torch
  3. from torch.backends import cudnn
  4. from backbone import EfficientDetBackbone
  5. import cv2
  6. import os
  7. import numpy as np
  8. from efficientdet.utils import BBoxTransform, ClipBoxes
  9. from utils.utils import preprocess, preprocess_, invert_affine, postprocess
  10. class <span class=”hljs-title class_“>EFFICIENTDET():
  11. def init(self,):
  12. compound_coef = 0
  13. force_input_size = None # set None to use default size
  14. # replace this part with your project’s anchor config
  15. anchor_ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]
  16. anchor_scales = [2 0, 2 (1.0 / 3.0), 2 ** (2.0 / 3.0)]
  17. self.threshold = 0.2
  18. self.iou_threshold = 0.2
  19. self.use_cuda = True
  20. self.use_float16 = False
  21. cudnn.fastest = True
  22. cudnn.benchmark = True
  23. self.obj_list = [‘cz’,‘cw’,‘zz’,‘zw’,‘dz’,‘dw’,‘zq’]
  24. # tf bilinear interpolation is different from any other’s, just make do
  25. input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
  26. self.input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
  27. self.model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(self.obj_list),
  28. ratios=anchor_ratios, scales=anchor_scales)
  29. self.model.load_state_dict(torch.load(‘logs/style_dataset/efficientdet-d0_82_6321.pth’))
  30. self.model.requires_grad_(False)
  31. self.model.eval()
  32. if self.use_cuda:
  33. self.model = self.model.cuda()
  34. if self.use_float16:
  35. self.model = self.model.half()
  36. def <span class=”hljs-title function_“>display(self,preds, imgs, name ,imshow=True, imwrite=False):
  37. for i in range(len(imgs)):
  38. if len(preds[i][‘rois’]) == 0:
  39. continue
  40. for j in range(len(preds[i][‘rois’])):
  41. (x1, y1, x2, y2) = preds[i][‘rois’][j].astype(np.int)
  42. cv2.rectangle(imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
  43. obj = self.obj_list[preds[i][‘class_ids’][j]]
  44. score = float(preds[i][‘scores’][j])
  45. cv2.putText(imgs[i], ‘{}, {:.3f}’.format(obj, score),
  46. (x1, y1 + 30), cv2.FONT_HERSHEY_SIMPLEX, 1.5,
  47. (0, 0, 255), 2)
  48. if imshow:
  49. cv2.imshow(‘img’, imgs[i])
  50. cv2.waitKey(0)
  51. if imwrite:
  52. cv2.imwrite(f’meina/‘+name, imgs[i])
  53. def <span class=”hljs-title function_“>process(self,image ,image_name=“default.jpg”):
  54. ori_imgs, framed_imgs, framed_metas = preprocess_([image], max_size=self.input_size)
  55. if self.use_cuda:
  56. x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
  57. else:
  58. x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
  59. x = x.to(torch.float32 if not self.use_float16 else torch.float16).permute(0, 3, 1, 2)
  60. t1 = time.time()
  61. with torch.no_grad():
  62. features, regression, classification, anchors = self.model(x)
  63. regressBoxes = BBoxTransform()
  64. clipBoxes = ClipBoxes()
  65. out = postprocess(x,
  66. anchors, regression, classification,
  67. regressBoxes, clipBoxes,
  68. self.threshold, self.iou_threshold)
  69. out = invert_affine(framed_metas, out)
  70. self.display(out, ori_imgs, image_name ,imshow=False, imwrite=True)
  71. t2 = time.time()
  72. tact_time = (t2 - t1)
  73. print(f’{tact_time} seconds, {1 / tact_time} FPS, @batch_size 1’)
  74. def <span class=”hljs-title function_“>test_images():
  75. efficientDet = EFFICIENTDET()
  76. #ipath = ‘datasets/style_dataset/val2020/‘
  77. ipath = ‘datasets/give_/‘
  78. for name in os.listdir(ipath):
  79. img_path = os.path.join(ipath, name)
  80. image =cv2.imread(img_path,1)
  81. efficientDet.process(image, name)
  82. if name==“__main__“:
  83. test_images()


  1. EfficientDet既可以用于检测任务,也可以用于分割任务。
  2. 双向的特征融合策略,BiFPN
  3. 组合式的尺度策略,compound scaling
  4. 速度和精度都很高的检测框架
  5. 训练速度挺快,1个epoch就可以loss下降为1。