本文参考–PyTorch官方教程中文版链接:http://pytorch123.com/FirstSection/PyTorchIntro/
Pytorch中文文档:https://pytorch-cn.readthedocs.io/zh/latest/package_references/Tensor/
PyTorch英文文档:https://pytorch.org/docs/stable/tensors.html
《深度学习之PyTorch物体检测实战》《动手学深度学习》
第一次接触PyTorch,网上很难找到最新版本的教程,先从它的官方资料入手吧!
内容列表
默认加载以下模块:
import os
import json
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
import torchvision
from torchvision import models
from torch.utils.data import Dataset
from torchvision import transforms
from torch.utils.data import DataLoader
import visdom
# from tensorboardX import SummaryWriter
from torch.utils.tensorboard import SummaryWriter
图像增广(image augmentation)
图像增广(image augmentation)技术通过对训练图像做一系列随机改变,来产生相似但又不同的训练样本,从而扩大训练数据集的规模。图像增广的另一种解释是,随机改变训练样本可以降低模型对某些属性的依赖,从而提高模型的泛化能力。
为了在预测时得到确定的结果,我们通常只将图像增广应用在训练样本上,而不在预测时使用含随机操作的图像增广。
数据加载
先回忆一下之间讲过的数据加载:
1.继承torch.utils.data.Dataset
抽象类,实现__len__()和__getitem__()方法,即可进行数据集迭代。其中__len__()用来提供数据集大小(可选),而__getitem__()用来支持整数索引(必须)也可以利用torchvision.datasets
加载Imagenet,CIFAR10,MNIST 等公共数据集
2.数据变换与增强 torchvision.transforms
利用torchvision.transforms
可以方便的进行图像缩放、裁剪、随机翻转、填充以及张量的归一化等操作。操作对象可以是PIL的Image或者Tensor如果需要进行多个变换功能,可以利用transforms.Compose
将多个变换整合起来。在实际使用时,常会将变换操作集成到Dataset
类中
3.继承torch.utils.data.DataLoader
类
经过前两步后已经可以获取每一个变换后的样本,而经过torch.utils.data.DataLoader
类包装之后就可以实现批量处理、随机选取等操作DataLoader
类是一个可迭代对象,对它的实例进行迭代即可用于训练过程
这里以之前用过的一个红绿灯数据集为例:
该数据集在img和json文件夹中分别存有图片与标注,文件名为0.jpg, 1.jpg… 0.json, 1.json…
读取代码如下:
class MyData(Dataset):
def __init__(self, img_path, annotation_path, transforms=None):
# 初始化,读取数据集
self.annotation_path = annotation_path
self.img_path = img_path
self.transforms = transforms
def __len__(self):
return len(os.listdir(self.img_path))
def __getitem__(self, index):
annotation = json.load(open(self.annotation_path + '/' + str(index) + '.json'))
img = Image.open(self.img_path + '/' + str(index) + '.jpg')
# plt.imshow(img)
if self.transforms:
img = self.transforms(img)
return img, annotation
dataset = MyData('D:/Download/Dataset/traffic_light/train/img', 'D:/Download/Dataset/traffic_light/train/json',
transforms=transforms.Compose([
transforms.Resize(240), # 将图像最短边缩至240,宽高比例不变
transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
]))
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)#, num_workers=2) # num_workers表示使用几个线程来加载数据 我的电脑加了这个参数就报错,可能不支持多线程操作
data_iter = iter(dataloader)
for step in range(1000):
data = next(data_iter)
# 下面即可将data用于训练网络
API
transforms.Compose
transforms.Compose(transforms)
Composes several transforms together.
transforms
(list ofTransform
objects) – list of transforms to compose.
transform = transforms.Compose([
# transforms.Resize(32), # 将图像最短边缩至240,宽高比例不变
transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
])
Transforms on PIL Image
transforms.ColorJitter
transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
Randomly change the brightness, contrast and saturation of an image.
brightness
(float
or tuple
of python:float (min, max)
) – How much to jitter brightness.brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
or the given [min, max]
. Should be non negative numbers.
contrast
(float
or tuple
of python:float (min, max)
) – How much to jitter contrast. contrast_factor ischosen uniformly from [max(0, 1 - contrast), 1 + contrast]
or the given [min, max]
. Should be non negative numbers.
saturation
(float
or tuple
of python:float (min, max)
) – How much to jitter saturation.saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
or the given [min, max]
. Should be non negative numbers.
hue
(float
or tuple of python:float (min, max)
) – How much to jitter hue. hue_factor is chosenuniformly from [-hue, hue]
or the given [min, max]
. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5
.
transforms.CenterCrop
transforms.CenterCrop(size)
Crops the given PIL Image
at the center.
size
(sequence
or int
) – Desired output size of the crop. If size is an int
instead of sequence
like (h,w), a square crop (size
, size
) is made.
transforms.RandomCrop
transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
Crop the given PIL Image at a random location.
size
(sequence
or int
) – Desired output size of the crop. If size is an int
instead of sequence
like(h, w)
, a square crop (size, size)
is made.
padding
(int
or sequence
, optional) – Optional padding on each border of the image. Default
isNone
. If a sequence
of length 4 is provided, it is used to pad left, top, right, bottom borders respectively.If a sequence
of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively.
pad_if_needed
(boolean
) – It will pad the image if smaller than the desired size to avoid raising anexception. Since cropping is done after padding, the padding seems to be done at a random offset.
fill
– Pixel fill value for constant fill. Default
is 0
. If a tuple of length 3
, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode
is constant
padding_mode
–Type of padding. Should be: constant, edge, reflect or symmetric. Default
is constant
.constant
: pads with a constant value, this value is specified with fill
edge
: pads with the last value on the edge of the image
reflect
: pads with reflection of image (without repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
symmetric
: pads with reflection of image (repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
transforms.RandomResizedCrop
transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
Crop the given PIL Image to random size and aspect ratio.
A crop of random size (default
: of 0.08
to 1.0
) of the original size and a random aspect ratio(default
: of 3/4
to 4/3
) of the original aspect ratio is made. This crop is finally resized to given size.This is popularly used to train the Inception networks.
size
– expected output size of each edge 可以是整数也可以是元组
scale
– range of size of the origin size cropped
ratio
– range of aspect ratio of the origin aspect ratio cropped 裁剪区域的宽高比
interpolation
– Default
: PIL.Image.BILINEAR
transforms.RandomHorizontalFlip
transforms.RandomHorizontalFlip(p=0.5)
The image can be a PIL Image
or a torch Tensor
, in which case it is expected to have […, H, W]
shape,where …
means an arbitrary number of leading dimensions
p
(float
) – probability of the image being flipped. Default
value is 0.5
transforms.RandomVerticalFlip
transforms.RandomVerticalFlip(p=0.5)
transforms.RandomRotation
transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
Rotate the image by angle.
degrees
(sequence
or float
or int
) – Range of degrees to select from. If degrees is a numberinstead of sequence like (min, max)
, the range of degrees will be (-degrees, +degrees)
.
resample
({PIL.Image.NEAREST
, PIL.Image.BILINEAR
, PIL.Image.BICUBIC
}, optional) – An optionalresampling filter. See filters for more information. If omitted, or if the image has mode “1” or “P”, it is set to PIL.Image.NEAREST
.
expand
(bool
, optional) – Optional expansion flag. If true
, expands the output to make it large enoughto hold the entire rotated image. If false
or omitted
, make the output image the same size as the inputimage. Note that the expand flag assumes rotation around the center and no translation.
center
(2-tuple
, optional) – Optional center of rotation. Origin is the upper left corner. Default
is the center of the image.
fill
(n-tuple
or int
or float
) – Pixel fill value for area outside the rotated image. If int
or float
,the value is used for all bands respectively. Defaults
to 0
for all bands. This option is only available for pillow>=5.2.0.
transforms.Resize
transforms.Resize(size, interpolation=2)
Resize the input PIL Image to the given size.
size
(sequence
or int
) – Desired output size. If size is a sequence
like (h, w)
, output size will bematched to this. If size is an int
, 最短边缩放为size且保持宽高比例不变
interpolation
(int
, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
Transforms on torch.*Tensor
transforms.Normalize
transforms.Normalize(mean, std, inplace=False)
Given mean: (mean[1],...,mean[n])
and std: (std[1],..,std[n])
for n
channels, this transform willnormalize each channel of the input torch.*Tensor
i.e.
, output[channel] = (input[channel] -mean[channel]) / std[channel]
__call__(tensor)
tensor
(Tensor
) – Tensor image of size (C, H, W)
to be normalized.
Conversion Transforms
transforms.ToPILImage
transforms.ToPILImage(mode=None)
Converts a torch.*Tensor
of shape C x H x W
or a numpy ndarray of shape H x W x C
to a PIL Image
while preserving the value range.
mode
(PIL.Image mode
)color space and pixel depth of input data (optional). If mode is None
(default
) there are someassumptions made about the input data:
If the input has 4 channels, the mode is assumed to be RGBA.
If the input has 3 channels, the mode is assumed to be RGB.
If the input has 2 channels, the mode is assumed to be LA.
If the input has 1 channel, the mode is determined by the data type (i.e int, float, short).
transforms.ToTensor
torchvision.transforms.ToTensor
Converts a PIL Image
or numpy.ndarray
(H x W x C)
in the range [0, 255]
to a torch.FloatTensor
ofshape (C x H x W)
in the range [0.0, 1.0]
if the PIL Image belongs to one of the modes (L, LA, P, I, F,RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8
In the other cases, tensors are returned without scaling.
Functional Transforms
Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform.
常用的图像增广方法
先加载一张图像作为实验样例:
vis = visdom.Visdom(env='image')
image = Image.open('./cat.jpg')
vis.image(transforms.ToTensor()(image), win='original image')
然后定义一个函数用于多次执行图像增广函数来测试效果:
def apply(original_img, transforms=None, title=None, nrow=4, num=8):
"""
对图像多次运用图像增广方法并展示所有结果
"""
for i in range(num):
img = transforms(original_img)
img = img.reshape(1, *img.shape)
if i == 0:
imgs = img
else:
imgs = torch.cat([imgs, img], 0)
vis.images(imgs, nrow=nrow, opts=dict(title=title))
翻转
- 左右翻转
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random horizontal flip')
- 上下翻转
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random vertical flip')
裁剪
图像的随机裁剪能使物体以不同的比例出现在图像的不同位置,从而降低模型对目标位置的敏感度
transform = transforms.Compose([
transforms.RandomResizedCrop(size=300, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333)),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random vertical flip')
变化颜色
亮度
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0), # 亮度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='brightness')
对比度
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0, hue=0), # 对比度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='contrast')
饱和度
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0, saturation=0.5, hue=0), # 饱和度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='saturation')
色调
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5), # -05-0.5
transforms.ToTensor(),
])
apply(image, transforms=transform, title='hue')
旋转
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomRotation(degrees=5, expand=False, fill=None),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='rotate')
评论(0)
您还未登录,请登录后发表或查看评论