本文参考–PyTorch官方教程中文版链接:http://pytorch123.com/FirstSection/PyTorchIntro/

Pytorch中文文档:https://pytorch-cn.readthedocs.io/zh/latest/package_references/Tensor/

PyTorch英文文档:https://pytorch.org/docs/stable/tensors.html

深度学习之PyTorch物体检测实战》《动手学深度学习》
第一次接触PyTorch,网上很难找到最新版本的教程,先从它的官方资料入手吧!

默认加载以下模块:

import os
import json
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
import torchvision
from torchvision import models
from torch.utils.data import Dataset
from torchvision import transforms
from torch.utils.data import DataLoader
import visdom
# from tensorboardX import SummaryWriter
from torch.utils.tensorboard import SummaryWriter

图像增广(image augmentation)

图像增广(image augmentation)技术通过对训练图像做一系列随机改变,来产生相似但又不同的训练样本,从而扩大训练数据集的规模。图像增广的另一种解释是,随机改变训练样本可以降低模型对某些属性的依赖,从而提高模型的泛化能力。

为了在预测时得到确定的结果,我们通常只将图像增广应用在训练样本上,而不在预测时使用含随机操作的图像增广。

数据加载

先回忆一下之间讲过的数据加载:

1.继承torch.utils.data.Dataset抽象类,实现__len__()和__getitem__()方法,即可进行数据集迭代。其中__len__()用来提供数据集大小(可选),而__getitem__()用来支持整数索引(必须)也可以利用torchvision.datasets加载Imagenet,CIFAR10,MNIST 等公共数据集

2.数据变换与增强 torchvision.transforms
利用torchvision.transforms可以方便的进行图像缩放、裁剪、随机翻转、填充以及张量的归一化等操作。操作对象可以是PIL的Image或者Tensor如果需要进行多个变换功能,可以利用transforms.Compose将多个变换整合起来。在实际使用时,常会将变换操作集成到Dataset类中

3.继承torch.utils.data.DataLoader
经过前两步后已经可以获取每一个变换后的样本,而经过torch.utils.data.DataLoader类包装之后就可以实现批量处理、随机选取等操作DataLoader类是一个可迭代对象,对它的实例进行迭代即可用于训练过程

这里以之前用过的一个红绿灯数据集为例:
该数据集在img和json文件夹中分别存有图片与标注,文件名为0.jpg, 1.jpg… 0.json, 1.json…
读取代码如下:

class MyData(Dataset):
    def __init__(self, img_path, annotation_path, transforms=None):
        # 初始化,读取数据集
        self.annotation_path = annotation_path
        self.img_path = img_path
        self.transforms = transforms
        
    def __len__(self):
        return len(os.listdir(self.img_path))
    
    def __getitem__(self, index):
        annotation = json.load(open(self.annotation_path + '/' + str(index) + '.json'))
        img = Image.open(self.img_path + '/' + str(index) + '.jpg')
        # plt.imshow(img)
        if self.transforms:
            img = self.transforms(img)
            
        return img, annotation
dataset = MyData('D:/Download/Dataset/traffic_light/train/img', 'D:/Download/Dataset/traffic_light/train/json', 
                transforms=transforms.Compose([
                    transforms.Resize(240), # 将图像最短边缩至240,宽高比例不变
                    transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
                    transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
                    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
                ]))
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)#, num_workers=2) # num_workers表示使用几个线程来加载数据 我的电脑加了这个参数就报错,可能不支持多线程操作
data_iter = iter(dataloader)
for step in range(1000):
    data = next(data_iter)
    # 下面即可将data用于训练网络

API

transforms.Compose

transforms.Compose(transforms)

Composes several transforms together.

  • transforms (list of Transform objects) – list of transforms to compose.
transform = transforms.Compose([
                    # transforms.Resize(32), # 将图像最短边缩至240,宽高比例不变
                    transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
                    transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
                    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
                ])

Transforms on PIL Image

transforms.ColorJitter

transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)

Randomly change the brightness, contrast and saturation of an image.

brightness (float or tuple of python:float (min, max)) – How much to jitter brightness.brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.

contrast (float or tuple of python:float (min, max)) – How much to jitter contrast. contrast_factor ischosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.

saturation (float or tuple of python:float (min, max)) – How much to jitter saturation.saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.

hue (float or tuple of python:float (min, max)) – How much to jitter hue. hue_factor is chosenuniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

transforms.CenterCrop

transforms.CenterCrop(size)

Crops the given PIL Image at the center.

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h,w), a square crop (sizesize) is made.

transforms.RandomCrop

transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')

Crop the given PIL Image at a random location.

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like(h, w), a square crop (size, size) is made.

padding (int or sequence, optional) – Optional padding on each border of the image. Default isNone. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively.If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively.

pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising anexception. Since cropping is done after padding, the padding seems to be done at a random offset.

fill – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant

padding_mode –Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.constant: pads with a constant value, this value is specified with fill

edge: pads with the last value on the edge of the image

reflect: pads with reflection of image (without repeating the last value on the edge)

padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]

symmetric: pads with reflection of image (repeating the last value on the edge)

padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

transforms.RandomResizedCrop

transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)

Crop the given PIL Image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio(default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.This is popularly used to train the Inception networks.

size – expected output size of each edge 可以是整数也可以是元组

scale – range of size of the origin size cropped

ratio – range of aspect ratio of the origin aspect ratio cropped 裁剪区域的宽高比

interpolation  Default: PIL.Image.BILINEAR

transforms.RandomHorizontalFlip

transforms.RandomHorizontalFlip(p=0.5)

The image can be a PIL Image or a torch Tensor, in which case it is expected to have […, H, W] shape,where  means an arbitrary number of leading dimensions

p (float) – probability of the image being flipped. Default value is 0.5

transforms.RandomVerticalFlip

transforms.RandomVerticalFlip(p=0.5)

transforms.RandomRotation

transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)

Rotate the image by angle.

degrees (sequence or float or int) – Range of degrees to select from. If degrees is a numberinstead of sequence like (min, max), the range of degrees will be (-degrees, +degrees).

resample ({PIL.Image.NEARESTPIL.Image.BILINEARPIL.Image.BICUBIC}, optional) – An optionalresampling filter. See filters for more information. If omitted, or if the image has mode “1” or “P”, it is set to PIL.Image.NEAREST.

expand (bool, optional) – Optional expansion flag. If true, expands the output to make it large enoughto hold the entire rotated image. If false or omitted, make the output image the same size as the inputimage. Note that the expand flag assumes rotation around the center and no translation.

center (2-tuple, optional) – Optional center of rotation. Origin is the upper left corner. Default is the center of the image.

fill (n-tuple or int or float) – Pixel fill value for area outside the rotated image. If int or float,the value is used for all bands respectively. Defaults to 0 for all bands. This option is only available for pillow>=5.2.0.

transforms.Resize

transforms.Resize(size, interpolation=2)

Resize the input PIL Image to the given size.

size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will bematched to this. If size is an int, 最短边缩放为size且保持宽高比例不变

interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR

Transforms on torch.*Tensor

transforms.Normalize

transforms.Normalize(mean, std, inplace=False)

Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform willnormalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] -mean[channel]) / std[channel]

__call__(tensor)

tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.

Conversion Transforms

transforms.ToPILImage

transforms.ToPILImage(mode=None)

Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Imagewhile preserving the value range.

mode (PIL.Image mode)color space and pixel depth of input data (optional). If mode is None (default) there are someassumptions made about the input data:

If the input has 4 channels, the mode is assumed to be RGBA.
If the input has 3 channels, the mode is assumed to be RGB.
If the input has 2 channels, the mode is assumed to be LA.
If the input has 1 channel, the mode is determined by the data type (i.e int, float, short).

transforms.ToTensor

torchvision.transforms.ToTensor

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor ofshape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F,RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

In the other cases, tensors are returned without scaling.

Functional Transforms

Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform.

常用的图像增广方法

先加载一张图像作为实验样例:

vis = visdom.Visdom(env='image')
image = Image.open('./cat.jpg')

vis.image(transforms.ToTensor()(image), win='original image')

在这里插入图片描述
然后定义一个函数用于多次执行图像增广函数来测试效果:

def apply(original_img, transforms=None, title=None, nrow=4, num=8):
    """
    对图像多次运用图像增广方法并展示所有结果
    """
    for i in range(num):
        img = transforms(original_img)
        img = img.reshape(1, *img.shape)
        if i == 0:
            imgs = img
        else:
            imgs = torch.cat([imgs, img], 0)
        
    vis.images(imgs, nrow=nrow, opts=dict(title=title))

翻转

  • 左右翻转
transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.RandomHorizontalFlip(),
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='random horizontal flip')

在这里插入图片描述

  • 上下翻转
transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.RandomVerticalFlip(),
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='random vertical flip')

在这里插入图片描述

裁剪

图像的随机裁剪能使物体以不同的比例出现在图像的不同位置,从而降低模型对目标位置的敏感度

transform = transforms.Compose([
                    transforms.RandomResizedCrop(size=300, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333)),
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='random vertical flip')

在这里插入图片描述

变化颜色

亮度

transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0), # 亮度为50%-150%
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='brightness')

在这里插入图片描述

对比度

transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0, hue=0), # 对比度为50%-150%
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='contrast')

在这里插入图片描述

饱和度

transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.ColorJitter(brightness=0, contrast=0, saturation=0.5, hue=0), # 饱和度为50%-150%
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='saturation')

在这里插入图片描述

色调

transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5), # -05-0.5
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='hue')

在这里插入图片描述

旋转

transform = transforms.Compose([
                    transforms.Resize(300),
                    transforms.RandomRotation(degrees=5, expand=False, fill=None),
                    transforms.ToTensor(), 
                ])
apply(image, transforms=transform, title='rotate')

在这里插入图片描述