论文地址:https://arxiv.org/pdf/1506.02640.pdf

本文所包含代码GitHub地址:https://github.com/shankezh/DL_HotNet_Tensorflow

如果对机器学习有兴趣,不仅仅满足将深度学习模型当黑盒模型使用的,想了解为何机器学习可以训练拟合最佳模型,可以看我过往的博客,使用数学知识推导了机器学习中比较经典的案例,并且使用了python撸了一套简单的神经网络的代码框架用来加深理解:https://blog.csdn.net/shankezh/article/category/7279585
项目帮忙或工作机会请邮件联系:cloud_happy@163.com
--------------------- 
本文对论文进行了代码复现;

论文阅读
关键信息提取
1.当前检测系统应用分类器去执行检测,如DPM,RCNN等;

2.YOLO则将对象检测看做一个回归问题;使用这个系统,你进需要看一次图像就可以预测对象是什么且对象在哪里;

3.系统划分输入图像为SxS个格子,如果一个对象的中心在一个单元格中,那么这个单元格就负责检测这个对象;

4.每一个单元格预测B个bbox和这些bbox的置信率分数;分数反映了模型对这个box中的对象预测的正确率的思考;定义置信率分数为Pr(Object) * IOU;

5.如果没有对象在这个单元格,那么置信率分数将会是0;否则我们想要置信率分数等于预测box和真实box的IOU;

6.每个bbox包含五个预测,x,y,w,h,confidence(置信率),(x,y)坐标表示盒子的中心相对与单元格的边界;width和height是相对于整个图像预测的;置信率表示为真实box和预测box之间的IOU;

7.每个格子也要预测C个条件概率,Pr(Class i | Object);这些概率是限制在格子包含的一个对象中;

8.仅预测每个单元格中一组类别概率,无论里面有多个boxes B;

9.测试时乘上条件概率和乘上分离的盒子置信率,公式见下图:

其中,Pr(Object)要么为1,表示单元格中有对象,要么为0,表示单元格中无对象;

10.评估YOLO在PASCAL VOC中,我们使用S=7,B=2. PASCAL VOC有二十个标签类别,因此C=20,那么最终预测的结果就是7x7x30 shape的tensor,公式为S x S x (B * 5 + C);

11.设计的网络中,卷积层从图像抽取特征,fc预测输出概率和坐标;

12.设计的网络受到GoogLeNet的图像分类启发,设计了24个卷积层后跟2个fc层,使用了1x1后跟3x3来代替GoogLeNet的inception模块,全部设计见下图:

作者在ImageNet上进行了预训练分类任务,分辨率是224x224,然后使用了翻倍的分辨率去做检测,也就是448x448;

训练

1.在ImageNet上预训练;

2.预训练使用了前20层卷积层,后面跟一个均值池化层和一个全连接层;

3.由于一篇论文说同时添加卷积核全连接可以提升性能,因此我们添加了4个卷积层和2个全连接层,并设置了随机初始化权重;由于检测往往需要细密的纹理视觉信息,因此增加了输入到网络的分辨率,将224x224变成了448x448;

4.最后一层预测类别概率和bbox坐标;作者通过图片的宽高规范化了bbox的宽和高,因此他们降到了0和1之间;我们参数化bbox坐标x,y为特殊的单元格定位偏移,因此他们也总是在0和1之间;

5.我们最后一层使用线性激活单元,其他所有曾都是用leaky 激活函数,激活函数见下图(Leaky ReLU);

6.优化器使用和方差作为模型输出,因此和方差容易计算,但对于最大化平均精度来说不够完美;权衡定位误差和分类误差不够完美;如果每个图像中许多个单元格都不包含对象,那么就设置下这些单元格的置信率为0,通常梯度会超过包含对象的单元格,这会导致模型不稳定,因为早期的训练会发散;

7.为了解决6的问题,作者增加了bbox的坐标预测损失,减少不包含对象的置信率预测损失,因此使用了两个参数λcoord, λnoobj , λcoord = 5, λnoobj = 0.5;

8.和方差同样对待大boxes和小boxes的误差,度量误差时,大的boxes比小的boxes偏差更重要,为了部分解决这个问题,作者预测bbox的宽高的方根来代替直接预测宽高;

9.设计的误差如下:

这里使用X(上角标 #下角标)来代替公式中具有上下角标的变量,其中 1(obj # i)代表在第i个单元格中存在对象;1(obj # ij)代表第j个bbox的预测器在单元格i中负责检测; 

10.在VOC2007和2012上训练了135个批次,batchsize为64,使用了momentun,冲量0.9,衰减0.0005;学习速率如下,第一个批次缓慢减少从0.001到0.01,后面到第75个批次,速率为0.01,然后再用0.001训练30个批次,最后使用0.0001训练30个批次;

11.为了避免过拟合,使用了dropout和数据扩充,dropout设置为0.5在第一个fc后,对于数据扩充使用了随机缩放和变换,大约在20%的原始图像尺寸,也随机调整了图像的HSV饱和度通过设置因子为1.5;

12.一些大的对象单位会交叉落到多个单元格内,因此,非极大抑制法(NMS)就起作用了;

YOLO的限制

1.yolo在预测bbox上具有较强的空间约束,因为每个单元格仅仅预测两个boxes和仅有一个类别;这样就限制了模型对较多邻近对象的预测;

2.模型也使用了粗略的相关性特征去预测bbox,因为结构有多重下采样层;

Tensorflow代码实现:

模型代码(包含预训练和检测两种网络)

model.py:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import net.Detection.YOLOV1.config as cfg
import numpy as np
 
 
class YOLO_Net(object):
    def __init__(self,is_pre_training=False,is_training = True):
        self.classes = cfg.VOC07_CLASS
        self.pre_train_num = cfg.PRE_TRAIN_NUM
        self.det_cls_num = len(self.classes)
        self.image_size = cfg.DET_IMAGE_SIZE
        self.cell_size = cfg.CELL_SIZE
        self.boxes_per_cell = cfg.PER_CELL_CHECK_BOXES
        self.output_size = (self.cell_size * self.cell_size) * ( 5 * self.boxes_per_cell + self.det_cls_num)
        self.scale = 1.0 * self.image_size / self.cell_size
        self.boundary1 = self.cell_size * self.cell_size * self.det_cls_num
        self.boundary2 = self.boundary1 + self.cell_size * self.cell_size * self.boxes_per_cell
        self.object_scale = cfg.OBJ_CONFIDENCE_SCALE
        self.no_object_scale = cfg.NO_OBJ_CONFIDENCE_SCALE
        self.class_scale = cfg.CLASS_SCALE
        self.coord_scale = cfg.COORD_SCALE
        self.learning_rate = 0.0001
        self.batch_size = cfg.BATCH_SIZE
        self.keep_prob = cfg.KEEP_PROB
        self.pre_training = is_pre_training
 
        self.offset = np.transpose(
            np.reshape(
                np.array(
                    [np.arange(self.cell_size)]*self.cell_size*self.boxes_per_cell
                ),(self.boxes_per_cell,self.cell_size,self.cell_size)
            ),(1,2,0)
        )
 
        self.bn_params = cfg.BATCH_NORM_PARAMS
        self.is_training = tf.placeholder(tf.bool)
        if self.pre_training:
            self.images = tf.placeholder(tf.float32, [None, 224, 224, 3], name='images')
        else:
            self.images = tf.placeholder(tf.float32, [None, self.image_size, self.image_size, 3], name='images')
 
        self.logits = self.build_network(self.images,is_training=self.is_training)
 
        if is_training:
            if self.pre_training:
                self.labels = tf.placeholder(tf.float32, [None,self.pre_train_num])
                self.classify_loss(self.logits,self.labels)
                self.total_loss = tf.losses.get_total_loss()
                self.evalution = self.classify_evalution(self.logits,self.labels)
                print('预训练网络')
            else:
                self.labels = tf.placeholder(tf.float32, [None,self.cell_size,self.cell_size,5+self.det_cls_num])
                self.det_loss_layer(self.logits,self.labels)
                self.total_loss = tf.losses.get_total_loss()
                tf.summary.scalar('total_loss', self.total_loss)
                print('识别网络')
 
 
 
 
    def build_network(self, images,is_training = True,scope = 'yolov1'):
        net = images
        with tf.variable_scope(scope):
            with slim.arg_scope([slim.conv2d, slim.fully_connected],
                                weights_regularizer=slim.l2_regularizer(0.00004)):
                with slim.arg_scope([slim.conv2d],
                                    weights_initializer=slim.xavier_initializer(),
                                    normalizer_fn=slim.batch_norm,
                                    activation_fn=slim.nn.leaky_relu,
                                    normalizer_params=self.bn_params):
                    with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
                        net = slim.conv2d(net, 64, [7, 7], stride=2, padding='SAME', scope='layer1')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool1')
 
                        net = slim.conv2d(net, 192, [3, 3], stride=1, padding='SAME', scope='layer2')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool2')
 
                        net = slim.conv2d(net, 128, [1, 1], stride=1, padding='SAME', scope='layer3_1')
                        net = slim.conv2d(net, 256, [3, 3], stride=1, padding='SAME', scope='layer3_2')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer3_3')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer3_4')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool3')
 
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_1')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_2')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_3')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_4')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_5')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_6')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_7')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_8')
                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer4_9')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer4_10')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool4')
 
                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_1')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_2')
                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_3')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_4')
 
                        if self.pre_training:
                            net = slim.avg_pool2d(net, [7, 7], stride=1, padding='VALID', scope='clssify_avg5')
                            net = slim.flatten(net)
                            net = slim.fully_connected(net, self.pre_train_num, activation_fn=slim.nn.leaky_relu,
                                                       scope='classify_fc1')
                            return net
 
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_5')
                        net = slim.conv2d(net, 1024, [3, 3], stride=2, padding='SAME', scope='layer5_6')
 
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_1')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_2')
 
                        net = slim.flatten(net)
 
                        net = slim.fully_connected(net, 1024, activation_fn=slim.nn.leaky_relu, scope='fc1')
                        net = slim.dropout(net, 0.5)
                        net = slim.fully_connected(net, 4096, activation_fn=slim.nn.leaky_relu, scope='fc2')
                        net = slim.dropout(net, 0.5)
                        net = slim.fully_connected(net, self.output_size, activation_fn=None, scope='fc3')
                        # N, 7,7,30
                        # net = tf.reshape(net,[-1,S,S,B*5+C])
            return net
 
    def classify_loss(self,logits,labels):
        with tf.name_scope('classify_loss') as scope:
            _loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
            mean_loss = tf.reduce_mean(_loss)
            tf.losses.add_loss(mean_loss)
            tf.summary.scalar(scope + 'classify_mean_loss', mean_loss)
 
    def classify_evalution(self,logits,labels):
        with tf.name_scope('classify_evaluation') as scope:
            correct_pre = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
            accurary = tf.reduce_mean(tf.cast(correct_pre, 'float'))
            # tf.summary.scalar(scope + 'accuracy:', accurary)
        return accurary
 
 
    '''
    @:param predicts shape->[N,7x7x30]
    @:param labels   shape->[N,7,7,25]  <==>[N,h方向,w方向,25] ==>[N,7,7,25(1:是否负责检测,2-5:坐标,6-25:类别one-hot)]
    '''
 
    def det_loss_layer(self, predicts, labels, scope='det_loss'):
        with tf.variable_scope(scope):
            predict_classes = tf.reshape(predicts[:, :self.boundary1],
                                         [-1, 7, 7, 20])  # 类别预测 ->[batch_size,cell_size,cell_size,num_cls]
            predict_scale = tf.reshape(predicts[:, self.boundary1:self.boundary2],
                                       [-1, 7, 7, 2])  # 置信率预测-> [batch_size,cell_size,cell_size,boxes_per_cell]
            predict_boxes = tf.reshape(predicts[:,self.boundary2:],
                                       [-1, 7, 7, 2, 4])  # 坐标预测->[batch_size,cell_size,cell_size,boxes_per_cell,4]
 
            response = tf.reshape(labels[:, :, :, 0], [-1, 7, 7, 1])  # 标签置信率,用来判断cell是否负责检测
            boxes = tf.reshape(labels[:, :, :, 1:5], [-1, 7, 7, 1, 4])  # 标签坐标
            boxes = tf.tile(boxes,
                            [1, 1, 1, 2, 1]) / self.image_size  # 标签坐标,由于预测是2个,因此需要将标签也变成2个,同时对坐标进行yolo形式归一化
            classes = labels[:, :, :, 5:]  # 标签类别
 
            offset = tf.constant(self.offset, dtype=tf.float32)
            offset = tf.reshape(offset, [1, 7, 7, 2])
            offset = tf.tile(offset, [tf.shape(boxes)[0], 1, 1, 1])
            predict_boxes_tran = tf.stack([
                1. * (predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
                1. * (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
                tf.square(predict_boxes[:, :, :, :, 2]),
                tf.square(predict_boxes[:, :, :, :, 3])
            ], axis=-1)
            # predict_boxes_tran = tf.transpose(predict_boxes_tran,[1,2,3,4,0])
 
            iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
            object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
            object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
            no_object_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
            boxes_tran = tf.stack([
                1. * boxes[:, :, :, :, 0] * 7 - offset,
                1. * boxes[:, :, :, :, 1] * 7 - tf.transpose(offset, (0, 2, 1, 3)),
                tf.sqrt(boxes[:, :, :, :, 2]),
                tf.sqrt(boxes[:, :, :, :, 3])
            ], axis=-1)
 
            # 类别损失
            class_delta = response * (predict_classes - classes)
            class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
                                        name='class_loss') * self.class_scale
 
            # 对象损失
            object_delta = object_mask * (predict_scale - iou_predict_truth)
            object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
                                         name='object_loss') * self.object_scale
 
            # 无对象损失
            no_object_delta = no_object_mask * predict_scale
            no_object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(no_object_delta), axis=[1, 2, 3]),
                                            name='no_object_loss') * self.no_object_scale
 
            # 坐标损失
            coord_mask = tf.expand_dims(object_mask, 4)
            boxes_delta = coord_mask * (predict_boxes - boxes_tran)
            coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
                                        name='coord_loss') * self.coord_scale
            tf.losses.add_loss(class_loss)
            tf.losses.add_loss(object_loss)
            tf.losses.add_loss(no_object_loss)
            tf.losses.add_loss(coord_loss)
 
            tf.summary.scalar('class_loss', class_loss)
            tf.summary.scalar('object_loss', object_loss)
            tf.summary.scalar('noobject_loss', no_object_loss)
            tf.summary.scalar('coord_loss', coord_loss)
 
            tf.summary.histogram('boxes_delta_x', boxes_delta[:, :, :, :, 0])
            tf.summary.histogram('boxes_delta_y', boxes_delta[:, :, :, :, 1])
            tf.summary.histogram('boxes_delta_w', boxes_delta[:, :, :, :, 2])
            tf.summary.histogram('boxes_delta_h', boxes_delta[:, :, :, :, 3])
            tf.summary.histogram('iou', iou_predict_truth)
 
 
    def calc_iou(self, boxes1, boxes2, scope='iou'):
        """calculate ious
               Args:
                 boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4]  ====> (x_center, y_center, w, h)
                 boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
               Return:
                 iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
               """
        with tf.variable_scope(scope):
            boxes1 = tf.stack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
                               boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
                               boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
                               boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0], axis=-1)
            # boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
 
            boxes2 = tf.stack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
                               boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
                               boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
                               boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2.0], axis=-1)
            # boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])
 
            lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
            rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])
 
            intersection = tf.maximum(0.0, rd - lu)
            inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]
 
            square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
                      (boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
            square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
                      (boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])
 
            union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)
 
        return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)

训练(包含分类预训练和检测训练):

solver.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Created by solver on 19-5-6
import tensorflow as tf
from net.Detection.YOLOV1.model import YOLO_Net
import net.Detection.YOLOV1.config as cfg
import tensorflow.contrib.slim as slim
from net.Detection.YOLOV1.voc07_img import Pascal_voc
from coms.learning_rate import CLR_EXP_RANGE
from coms.utils import  isHasGpu,isLinuxSys
import time,os
from coms.pre_process import get_cifar10_batch
import net.Detection.YOLOV1.voc07_tfrecord as VOC07RECORDS
 
class Solver(object):
    def __init__(self,net,data,tf_records=False):
        self.net = net
        self.data = data
        self.tf_records = tf_records
        self.batch_size = cfg.BATCH_SIZE
        self.clr = CLR_EXP_RANGE()
        self.log_dir = cfg.LOG_DIR
        self.model_cls_dir = cfg.CLS_MODEL_DIR
        self.model_det_dir = cfg.DET_MODEL_DIR
        self.learning_rate = tf.placeholder(tf.float32)
        self.re_train = True
        tf.summary.scalar('learning_rate',self.learning_rate)
        self.optimizer = self.optimizer_bn(lr=self.learning_rate,loss=self.net.total_loss)
        if isHasGpu():
            gpu_option = tf.GPUOptions(allow_growth=True)
            config = tf.ConfigProto(allow_soft_placement=True,gpu_options=gpu_option)
        else:
            config = tf.ConfigProto(allow_soft_placement=True)
        self.sess = tf.Session(config=config)
        self.sess.run(tf.global_variables_initializer())
 
        self.summary_op = tf.summary.merge_all()
        n_time = time.strftime("%Y-%m-%d %H-%M", time.localtime())
        self.writer = tf.summary.FileWriter(os.path.join(self.log_dir, n_time),self.sess.graph)
        self.saver = tf.train.Saver(max_to_keep=4)
 
 
    def train_classify(self):
        self.set_classify_params()
        max_acc = 0.
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=self.sess, coord=coord)
        for epoch in range(cfg.EPOCH):
            for step in range(1,cfg.ITER_STEP+1):
                learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.001,0.01,gamma=0.9998)
                train_img_batch, train_label_batch = self.sess.run([self.train_img_batch,self.train_label_batch])
                feed_dict_train = {self.net.images:train_img_batch, self.net.labels:train_label_batch, self.net.is_training:True,self.learning_rate:learning_rate_val}
                _, summary_op, batch_train_loss, batch_train_acc = self.sess.run([self.optimizer, self.summary_op,self.net.total_loss,self.net.evalution],feed_dict=feed_dict_train)
 
                global_step = int(epoch * cfg.ITER_STEP + step + 1)
                print("epoch %d , step %d train end ,loss is : %f ,accuracy is %f ... ..." % (epoch, step, batch_train_loss, batch_train_acc))
                train_summary = tf.Summary(
                    value=[tf.Summary.Value(tag='train_loss', simple_value=batch_train_loss)
                        , tf.Summary.Value(tag='train_batch_accuracy', simple_value=batch_train_acc)
                        , tf.Summary.Value(tag='learning_rate', simple_value=learning_rate_val)])
                self.writer.add_summary(train_summary,global_step=global_step)
                self.writer.add_summary(summary_op,global_step=global_step)
                self.writer.flush()
 
 
                if step % 100 == 0:
                    print('test sets evaluation start ...')
                    ac_iter = int(10000 / self.batch_size)  # cifar-10测试集数量10000张
                    ac_sum = 0.
                    loss_sum = 0.
                    for ac_count in range(ac_iter):
                        batch_test_img, batch_test_label = self.sess.run([self.test_img_batch, self.test_label_batch])
                        feed_dict_test = {self.net.images: batch_test_img,self.net.labels: batch_test_label,self.net.is_training: False,self.learning_rate:learning_rate_val}
                        test_loss, test_accuracy = self.sess.run([self.net.total_loss, self.net.evalution],feed_dict=feed_dict_test)
 
                        ac_sum += test_accuracy
                        loss_sum += test_loss
                    ac_mean = ac_sum / ac_iter
                    loss_mean = loss_sum / ac_iter
                    print('epoch {} , step {} , accuracy is {}'.format(str(epoch), str(step), str(ac_mean)))
                    test_summary = tf.Summary(
                        value=[tf.Summary.Value(tag='test_loss', simple_value=loss_mean)
                            , tf.Summary.Value(tag='test_accuracy', simple_value=ac_mean)])
                    self.writer.add_summary(test_summary, global_step=global_step)
                    self.writer.flush()
 
                    if ac_mean >= max_acc:
                        max_acc = ac_mean
                        self.saver.save(self.sess, self.model_cls_dir + '/' + 'cifar10_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
                        print('max accuracy has reaching ,save model successful ...')
        print('train network task was run over')
 
 
    def set_classify_params(self):
        self.train_img_batch,self.train_label_batch = get_cifar10_batch(is_train=True,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
        self.test_img_batch,self.test_label_batch = get_cifar10_batch(is_train=False,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
 
    def train_detector(self):
        self.set_detector_params()
        for epoch in range(cfg.EPOCH):
            for step in range(1,cfg.ITER_STEP+1):
                global_step = int(epoch * cfg.ITER_STEP + step + 1)
                learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.0001,0.0005,gamma=0.9998)
                if self.tf_records:
                    train_images, train_labels = self.sess.run(self.train_next_elements)
                else:
                    train_images, train_labels = self.data.next_batch(self.gt_labels_train, self.batch_size)
                feed_dict_train = {self.net.images:train_images,self.net.labels:train_labels,self.learning_rate:learning_rate_val,self.net.is_training:True}
                _,summary_str,train_loss = self.sess.run([self.optimizer,self.summary_op,self.net.total_loss],feed_dict=feed_dict_train)
                print("epoch %d , step %d train end ,loss is : %f  ... ..." % (epoch, step, train_loss))
                self.writer.add_summary(summary_str,global_step)
 
                if step % 50 ==0:
                    print('test sets start ...')
                    # test sets sum :4962
                    sum_loss = 0.
                    # test_iter = int (4962 / self.batch_size)
                    test_iter = 10  # 取10个批次求均值
                    for _ in range(test_iter):
                        if self.tf_records:
                            test_images, test_labels = self.sess.run(self.test_next_elements)
                        else:
                            test_images,test_labels = self.data.next_batch(self.gt_labels_test,self.batch_size)
                        feed_dict_test = {self.net.images:test_images,self.net.labels:test_labels,self.net.is_training:False}
                        loss_iter = self.sess.run(self.net.total_loss,feed_dict=feed_dict_test)
                        sum_loss += loss_iter
 
                    mean_loss = sum_loss/test_iter
                    print('epoch {} , step {} , test loss is {}'.format(str(epoch), str(step), str(mean_loss)))
                    test_summary = tf.Summary(
                        value=[tf.Summary.Value(tag='test_loss', simple_value=mean_loss)])
                    self.writer.add_summary(test_summary, global_step=global_step)
                    self.writer.flush()
 
            self.saver.save(self.sess,self.model_det_dir+'/' + 'det_voc07_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
            print('save model successful ...')
 
    def set_detector_params(self):
        if self.tf_records:
            train_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/trainval.tfrecords'
            test_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/test.tfrecords'
            train_datasets = VOC07RECORDS.DataSets(record_path=train_records_path,batch_size=self.batch_size)
            train_gen = train_datasets.transform(shuffle=True)
            train_iterator = train_gen.make_one_shot_iterator()
            self.train_next_elements = train_iterator.get_next()
            test_datasets = VOC07RECORDS.DataSets(record_path=test_records_path, batch_size=self.batch_size)
            test_gen = test_datasets.transform(shuffle=True)
            test_iterator = test_gen.make_one_shot_iterator()
            self.test_next_elements = test_iterator.get_next()
        else:
            self.gt_labels_train = self.data.prepare('train')
            self.gt_labels_test = self.data.prepare('test')
        if self.re_train:
            self.load_det_model()
        else:
            self.load_pre_train_model()
 
 
    def load_pre_train_model(self):
        net_vars = slim.get_model_variables()
        model_file = tf.train.latest_checkpoint(self.model_cls_dir)
        reader = tf.train.NewCheckpointReader(model_file)
        model_vars = reader.get_variable_to_shape_map()
        exclude = ['yolov1/classify_fc1/weights', 'yolov1/classify_fc1/biases']
 
        vars_restore_map = {}
        for var in net_vars:
            if var.op.name in model_vars and var.op.name not in exclude:
                vars_restore_map[var.op.name] = var
 
        self.saver = tf.train.Saver(vars_restore_map,max_to_keep=4)
        self.saver.restore(self.sess, model_file)
        self.saver = tf.train.Saver(var_list=net_vars,max_to_keep=4)
 
    def load_det_model(self):
        # self.saver = tf.train.Saver(max_to_keep=4)
        net_vars = slim.get_model_variables()
        self.saver = tf.train.Saver(net_vars,max_to_keep=4)
 
        model_file = tf.train.latest_checkpoint(self.model_det_dir)
        self.saver.restore(self.sess, model_file)
 
 
 
    # 带BN的训练函数
    def optimizer_bn(self,lr, loss, mom=0.9, fun='mm'):
        with tf.name_scope('optimzer_bn'):
            update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
            with tf.control_dependencies([tf.group(*update_ops)]):
                optim = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9)
                train_op = slim.learning.create_train_op(loss, optim)
        return train_op
 
 
 
def train_classify():
    yolov1 = YOLO_Net(is_pre_training=True)
 
    sovler = Solver(net= yolov1,data=0)
    print('start ...')
    sovler.train_classify()
 
def train_detector():
    yolov1 = YOLO_Net(is_pre_training=False)
    pasvoc07 = Pascal_voc()
    sovler = Solver(net=yolov1,data=pasvoc07)
    print('start train ...')
    sovler.train_detector()
 
def train_detector_with_records():
    yolov1 = YOLO_Net(is_pre_training=False)
    sovler = Solver(net=yolov1,data=0,tf_records=True)
    print('start train ...')
    sovler.train_detector()
 
if __name__ == '__main__':
    train_detector_with_records()

至于测试代码detector和classifier我都放到github上,这里就不贴了,yolo一刀切完成识别和定位,强迫网络训练的形式是最像神经网络风格的,但毫无疑问的是,每个格子预测固定数量,一旦出现检测目标贴近就会有漏检的风险,recall就会很低;当然,yolov3改进了这些问题;

个人训练细节:

预训练使用了cifar10数据集,比较少,且分辨率比较低,因此识别训练也会受到很大影响,尤其是cifar10和voc07数据集类别对不上的情况较为突出;

识别训练使用了voc07数据集,前前后后,训练了上千个批次,batchsize也从32,64,96,128,断点训练的时候改的,这里的贴图也只是其中最开始的训练,后续的重载训练就不贴图了;

个人网络遵循了yolov1的网络,区别在于添加了BN加速了训练过程;

贴一下分类训练:

贴一下识别训练:

贴一下识别效果:

结论:

本次识别效果一般,误检和漏检都挺多;

yolo的设计思想对很多看rcnn二刀流流派的人来讲,较为怪异,很人多弄不明白,尤其是写代码的时候,建议多参考参考他人代码,我也是,参考了其他人写的代码,才能写出yolo的代码,最重要的,是yolo标签的制作过程,和其他的有很大不同,识别效果挺一般的,当然和我的预训练数据集有较大关系,也离不开yolov1本身设计还是有较大问题的缘故,后续会有其它升级论文来改正这个缺点;总之,好的效果需要建立在 好的数据集,好的模型,好的训练技巧的基础上。

这次代码全部放在了个人的Github上,包含了yolo的模型,训练(预训练,分类训练),检测(识别检测,分类检测),yolo标签制作(img版本和tfrecords版本),基本上应该是目前Yolov1最全的了,网上应该找不到比我这个还全的tensorflow的版本;

我训练好的权重传到了百度云盘,可以自行下载体验,权重文件,百度云地址为链接:https://pan.baidu.com/s/1BdMZYvkiYT9Fts0dLIgrog

提取码:0rmi 

本次代码参考:

【1】https://github.com/TowardsNorth/yolo_v1_tensorflow_guiyu

【2】https://github.com/leeyoshinari/YOLO_v1