目录

0 前言

1 环境配置

1.1 安装python包

1.2 下载detail-api

1.3 运行prepare_pcontext.py

1.4 运行 prepare_ade20k.py

2 训练模型

3 测试模型

3.1 下载模型

3.2 测试 encnet_jpu_res50_pcontext.pth.tar

3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)

3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210

3.2.3 predict [single-scale] (单一尺寸)

4 报错与解决:

4.1 detail-api编译报错

4.2 模型文件丢失

4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'

参考链接:


0 前言

        全称:FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation----沈阳自动化所团队

        论文:https://arxiv.org/abs/1903.11816

        github:https://github.com/wuhuikai/FastFCN

        本机:RTX3070、cuda-11.0、torch-1.7.1+cu110、python3.7

        FastFCN下一篇:深度学习(8):FastFCN代码运行、测试与预测2_biter0088的博客-CSDN博客

1 环境配置

        官方测试的环境:

PyTorch >= 1.1.0 (Note: The code is test in the environment with python=3.6, cuda=9.0)

#master版本,我克隆的是2022年3月版本的,作者可能会有改动
git clone https://github.com/wuhuikai/FastFCN.git 
cd FastFCN

1.1 安装python包

        创建文件requirements.txt,安装其他包

        注:激活python环境 source activate yolov5py37

nose
tqdm
scipy
cython
requests
scikit-image
python3-dev
libevent-dev
cPython
pip install -r requirements.txt

1.2 下载detail-api

        下载到FastFCN目录下:

git clone https://github.com/zhanghang1989/detail-api

        并注释/xx/FastFCN/scripts/prepare_pcontext.py文件如下:

def install_pcontext_api():
    #repo_url = "https://github.com/zhanghang1989/detail-api"
    #os.system("git clone " + repo_url)
    os.system("cd detail-api/PythonAPI/ && python setup.py install")
    shutil.rmtree('detail-api')
    try:
        import detail
    except Exception:
        print("Installing PASCAL Context API failed, please install it manually %s"%(repo_url))

        注:执行prepare_pcontext.py后,detail-api被安装,上面箭头指的文件夹会被删除

1.3 运行prepare_pcontext.py

        文件目录为:/xx/FastFCN/scripts/prepare_pcontext.py,准备VOC2010数据集

python -m scripts.prepare_pcontext

        会下载VOC2010数据到如下目录:

#VOC2010数据集
官方网站:http://host.robots.ox.ac.uk/pascal/VOC/voc2010/index.html
 
.
└── VOCdevkit     #根目录
    └── VOC2010   #不同年份的数据集,这里只下载了2012的,还有2007等其它年份的
        ├── Annotations        #存放xml文件,与JPEGImages中的图片一一对应,解释图片的内容等等
        ├── ImageSets          #该目录下存放的都是txt文件,txt文件中每一行包含一个图片的名称,末尾会加上±1表示正负样本
        │   ├── Action
        │   ├── Layout
        │   ├── Main
        │   └── Segmentation
        ├── JPEGImages         #存放源图片
        ├── SegmentationClass  #存放的是图片,语义分割相关,标注出每个像素的类别
        └── SegmentationObject #存放的是图片,实例分割相关,标注出每个像素属于哪一个物体


         下载完成后,会编译安装detail-api,安装完成后会删除前面1.2的下载文件----所以如果detail-api在终端打印输出安装成功时,下面几行就没有作用了,可以注释掉

    os.system("cd detail-api/PythonAPI/ && python setup.py install")
    shutil.rmtree('detail-api')

        注:prepare_pcontext.py程序再次运行时,还会重新下载一遍VOC2010数据集----一个bug(一般如果成功安装包和下载数据后,这个程序就不要运行了);如果第一遍数据下载成功后,出现了一些其他报错,需要再次运行prepare_pcontext.py去准备数据和环境包时,可以将下面几行注释掉:

if __name__ == '__main__':
    args = parse_args()
    #mkdir(os.path.expanduser('~/.encoding/data'))
    #if args.download_dir is not None:
    #    if os.path.isdir(_TARGET_DIR):
    #        os.remove(_TARGET_DIR)
        # make symlink
    #    os.symlink(args.download_dir, _TARGET_DIR)
    #else:
    #    download_ade(_TARGET_DIR, overwrite=False)
    install_pcontext_api()

1.4 运行 prepare_ade20k.py

        文件目录为:/xxx/FastFCN/scripts/prepare_ade20k.py,准备ADEChallengeData2016数据集。

python -m scripts.prepare_ade20k
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ python -m scripts.prepare_ade20k
Downloading /home/meng/.encoding/data/downloads/ADEChallengeData2016.zip from http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip...
944710KB [05:23, 2923.61KB/s]                                                                                                                                                                                                                                                      
Downloading /home/meng/.encoding/data/downloads/release_test.zip from http://data.csail.mit.edu/places/ADEchallenge/release_test.zip...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 206856/206856 [04:29<00:00, 766.68KB/s]
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ 

2 训练模型

        在训练模型之前,参考4.2和4.3进行操作

        参考:FastFCN/encnet_res50_pcontext.sh at master · wuhuikai/FastFCN · GitHub

        训练encnet_res_50模型的参考命令为:

#train
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext \
    --model encnet --jpu [JPU|JPU_X] --aux --se-loss \
    --backbone resnet50 --checkname encnet_res50_pcontext

        这里输入:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext     --model encnet --jpu JPU --aux --se-loss     --backbone resnet50 --checkname encnet_res50_pcontext
 

        能够训练,但RuntimeError: CUDA out of memory.

        ---------先不训练了

3 测试模型

3.1 下载模型

       在 https://github.com/wuhuikai/FastFCN#pcontext 下载作者训练好的模型文件。(下图右侧的bash文件包含指令:训练--预测--fps计算)

        在下面文件夹中存放上述文件:

3.2 测试 encnet_jpu_res50_pcontext.pth.tar

3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)

#github参考输入
#test [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu [JPU|JPU_X] --aux --se-loss \
    --backbone resnet50 --resume {MODEL} --split val --mode testval

        我这里输入:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu JPU --aux --se-loss \
    --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval

        像素准确度pixAcc=0.7898,平均交并比mIou=0.5105,测试约10分钟。

3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)

#github参考输入
#test [multi-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu [JPU|JPU_X] --aux --se-loss \
    --backbone resnet50 --resume {MODEL} --split val --mode testval --ms

        这里输入:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu JPU --aux --se-loss \
    --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval --ms

        测试耗时1小时19分钟,像素准确度pixAcc为0.7964,平均交并比为0.5210

        test [multi-scale] 比test [single-scale] 多了个选项--ms,--ms在test文件里面首先改变scales

        然后跳转到base.py文件,scales值被传递过来

        接着在base.py文件里面进行系列的计算:

        for scale in self.scales:
            long_size = int(math.ceil(self.base_size * scale))#math.ceil():大于浮点数的最小整数
            if h > w:
                height = long_size
                width = int(1.0 * w * long_size / h + 0.5) #好像是根据原长h:w来设置新长度height和width
                short_size = width
            else:
                width = long_size
                height = int(1.0 * h * long_size / w + 0.5)
                short_size = height
            # resize image to current size
            cur_img = resize_image(image, height, width, **self.module._up_kwargs)
            if long_size <= crop_size: #if 和 else 保证pad_img的长宽都不小于crop_size
                pad_img = pad_image(cur_img, self.module.mean,
                                    self.module.std, crop_size)
                outputs = module_inference(self.module, pad_img, self.flip)
                outputs = crop_image(outputs, 0, height, 0, width)
            else:
                if short_size < crop_size:
                    # pad if needed
                    pad_img = pad_image(cur_img, self.module.mean,
                                        self.module.std, crop_size)
                else:
                    pad_img = cur_img
                _,_,ph,pw = pad_img.size()
                assert(ph >= height and pw >= width)
                # grid forward and normalize
                h_grids = int(math.ceil(1.0 * (ph-crop_size)/stride)) + 1
                w_grids = int(math.ceil(1.0 * (pw-crop_size)/stride)) + 1
                with torch.cuda.device_of(image):
                    outputs = image.new().resize_(batch,self.nclass,ph,pw).zero_().cuda()
                    count_norm = image.new().resize_(batch,1,ph,pw).zero_().cuda()
                # grid evaluation
                for idh in range(h_grids):
                    for idw in range(w_grids):
                        h0 = idh * stride
                        w0 = idw * stride
                        h1 = min(h0 + crop_size, ph)
                        w1 = min(w0 + crop_size, pw)
                        crop_img = crop_image(pad_img, h0, h1, w0, w1)
                        # pad if needed
                        pad_crop_img = pad_image(crop_img, self.module.mean,
                                                 self.module.std, crop_size)
                        output = module_inference(self.module, pad_crop_img, self.flip)
                        outputs[:,:,h0:h1,w0:w1] += crop_image(output,
                            0, h1-h0, 0, w1-w0)
                        count_norm[:,:,h0:h1,w0:w1] += 1
                assert((count_norm==0).sum()==0)
                outputs = outputs / count_norm
                outputs = outputs[:,:,:height,:width]
 
            score = resize_image(outputs, h, w, **self.module._up_kwargs)
            scores += score
 
        return scores

         注意在base.py里面有对scores的定义:

        with torch.cuda.device_of(image):
            scores = image.new().resize_(batch,self.nclass,h,w).zero_().cuda()

        在知乎上一个回答是:

 

3.2.3 predict [single-scale] (单一尺寸)

#github参考输入
#predict [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu [JPU|JPU_X] --aux --se-loss \
    --backbone resnet50 --resume {MODEL} --split val --mode test

        这里输入:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
    --model encnet --jpu JPU --aux --se-loss \
    --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test

        结果为:

(yolov5py37) meng@meng:~/deeplearning/FastFCN$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
>     --model encnet --jpu JPU --aux --se-loss \
>     --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='pcontext', dilated=False, epochs=80, ft=False, jpu='JPU', lateral=False, lr=0.001, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar', save_folder='experiments/segmentation/results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16)
loading annotations into memory...
JSON root keys:dict_keys(['info', 'images', 'annos_segmentation', 'annos_occlusion', 'annos_boundary', 'categories', 'parts'])
Done (t=3.22s)
creating index...
index created! (t=2.42s)
mask_file: /home/meng/.encoding/data/VOCdevkit/VOC2010/val.pth
=> loaded checkpoint '/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar' (epoch 79)

        观察上面打印的:save_folder,去找预测的结果,进行对比(原图片在:/home/meng/.encoding/data/VOCdevkit/VOC2010/JPEGImages

        对比2008_000064图片

        图片介绍文件:2008_000064.xml:

<annotation>
	<folder>VOC2010</folder>
	<filename>2008_000064.jpg</filename>
	<source>
		<database>The VOC2008 Database</database>
		<annotation>PASCAL VOC2008</annotation>
		<image>flickr</image>
	</source>
	<size>
		<width>375</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>aeroplane</name>
		<pose>Frontal</pose>
		<truncated>1</truncated>
		<occluded>0</occluded>
		<bndbox>
			<xmin>1</xmin>
			<ymin>152</ymin>
			<xmax>375</xmax>
			<ymax>461</ymax>
		</bndbox>
		<difficult>0</difficult>
	</object>
</annotation>


4 报错与解决:

4.1 detail-api编译报错

        error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually https://github.com/zhanghang1989/detail-api

        我第一遍运行prepare_pcontext.py程序时,编译detail-api报错如下,此时我按照1.1和1.2的操作解决了问题.

gcc -pthread -B /home/meng/anaconda3/envs/yolov5py37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include -I../common -I/home/meng/anaconda3/envs/yolov5py37/include/python3.7m -c detail/_mask.c -o build/temp.linux-x86_64-3.7/detail/_mask.o
In file included from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969:0,
                 from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from detail/_mask.c:461:
/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^~~~~~~
detail/_mask.c: In function ‘__Pyx_PyCFunction_FastCall’:
detail/_mask.c:12772:13: error: too many arguments to function ‘(PyObject * (*)(PyObject *, PyObject * const*, Py_ssize_t))meth’
     return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
            ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
detail/_mask.c: In function ‘__Pyx__ExceptionSave’:
detail/_mask.c:14254:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     *type = tstate->exc_type;
                     ^~~~~~~~
                     curexc_type
detail/_mask.c:14255:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     *value = tstate->exc_value;
                      ^~~~~~~~~
                      curexc_value
detail/_mask.c:14256:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     *tb = tstate->exc_traceback;
                   ^~~~~~~~~~~~~
                   curexc_traceback
detail/_mask.c: In function ‘__Pyx__ExceptionReset’:
detail/_mask.c:14263:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     tmp_type = tstate->exc_type;
                        ^~~~~~~~
                        curexc_type
detail/_mask.c:14264:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     tmp_value = tstate->exc_value;
                         ^~~~~~~~~
                         curexc_value
detail/_mask.c:14265:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     tmp_tb = tstate->exc_traceback;
                      ^~~~~~~~~~~~~
                      curexc_traceback
detail/_mask.c:14266:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     tstate->exc_type = type;
             ^~~~~~~~
             curexc_type
detail/_mask.c:14267:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     tstate->exc_value = value;
             ^~~~~~~~~
             curexc_value
detail/_mask.c:14268:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     tstate->exc_traceback = tb;
             ^~~~~~~~~~~~~
             curexc_traceback
detail/_mask.c: In function ‘__Pyx__GetException’:
detail/_mask.c:14323:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     tmp_type = tstate->exc_type;
                        ^~~~~~~~
                        curexc_type
detail/_mask.c:14324:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     tmp_value = tstate->exc_value;
                         ^~~~~~~~~
                         curexc_value
detail/_mask.c:14325:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     tmp_tb = tstate->exc_traceback;
                      ^~~~~~~~~~~~~
                      curexc_traceback
detail/_mask.c:14326:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     tstate->exc_type = local_type;
             ^~~~~~~~
             curexc_type
detail/_mask.c:14327:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     tstate->exc_value = local_value;
             ^~~~~~~~~
             curexc_value
detail/_mask.c:14328:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     tstate->exc_traceback = local_tb;
             ^~~~~~~~~~~~~
             curexc_traceback
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually https://github.com/zhanghang1989/detail-api


4.2 模型文件丢失

        报错:RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip

        点开报错的链接进入:http://ttps://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip

        换了几种上网方式都无法访问,大概是作者删模型文件了吧 

        在github上提问,作者给了三个模型的下载链接: 

https://drive.google.com/drive/folders/1YFv8JR5IYol2_kDHPMXfUkjXt4Z6t_rh

        将下载的文件放在下面的文件夹中

4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'

报错原因:

The reason is that you're not using multiple GPUs. Change SynBN to regular BN if you want to train on one GPU.

没有使用多个GPU进行训练,如果使用一个GPU进行训练时,将SynBN修改为regular BN

        (1)修改/FastFCN/experiments/segmentation/train.py的54行

        (2)修改/FastFCN/experiments/segmentation/train.py的111行  

        (3)移除/FastFCN/experiments/segmentation/train.py的132行   

参考链接:

一个博主汇总的部分pytorch官方训练的resnet:

https://blog.csdn.net/sgfmby1994/article/details/103876681

在github上提问:

RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip · Issue #108 · wuhuikai/FastFCN · GitHub

多gpu改为单gpu:how to Change SynBN to regular BN ? · Issue #12 · wuhuikai/FastFCN · GitHub

Pascal VOC数据集分析:

Pascal Voc数据集详细分析_持久决心的博客-CSDN博客_pascal voc

知乎:关于多尺度与单一尺度的理解:

如何理解深度学习中的multi scale和single scale? - 知乎