【阅读笔记】时间序列之TPA-LSTM(含Pytorch代码实现)

卡卡南安

发布时间 2024.01.28阅读数 480 评论数 0

本文作为自己阅读论文后的总结和思考，不涉及论文翻译和模型解读，适合大家阅读完论文后交流想法，关于论文翻译可以查看参考文献。论文地址：https://arxiv.org/abs/1809.04206

一. 全文总结

本文提出使用一组滤波器来提取时不变的时间模式(CNN)，类似于将时间序列数据转换为其“频域”。然后，我们提出了一种新的注意力机制来选择相关的时间序列，并利用其频域信息进行多元预测。本文将所提出的模型应用于几个真实世界的任务，并在几乎所有的情况下实现最先进的性能。

二. 研究方法

提出了一种新的attention机制，即Temporal pattern attention(TPA)，其中“Temporal pattern”来指代跨多个时间步骤的任何时不变模式。
在TPA中，机器不是像典型的注意机制那样选择相关的时间步长，而是学习选择相关的时间序列。引入卷积神经网络(CNN)从每个个体变量中提取时间模式信息。
三. 结论
本文以MTS预测为研究对象，提出了一种新的时间模式注意机制，消除了典型注意机制对此类任务的限制。允许注意力维度具有特征，以便模型不仅在同一时间步内而且在之前的所有时间和序列中学习多个变量之间的相互依赖关系。在toy example和真实数据集上的实验都强烈支持这一想法，并表明所提出的模型达到了最先进的结果。

四. 创新点

典型的注意机制通常只关注几个时间步长，难以识别跨越多个时间步长的周期模式。本文引入了一个新的注意力概念，我们选择相关的变量，而不是相关的时间步长。该方法简单、通用，适用于RNN。

五. 思考

经过仿真该模型确有比较明显的效果

六. 参考文献

TPA注意力机制(TPA-LSTM)
七. Pytorch实现
以下代码参考：：https://github.com/jingw2/demand_forecast，修正了原代码中存在的一些错误，附添加了一些必要的注释让大家更好理解。

import torch 
from torch import nn
import torch.nn.functional as F 
from torch.optim import Adam

import numpy as np
import math
import os
import random
import matplotlib.pyplot as plt
import pickle
from tqdm import tqdm
import pandas as pd
from sklearn.preprocessing import StandardScaler
from datetime import date
import argparse
from progressbar import *

util(工具函数)

def get_data_path():
    folder = os.path.dirname(__file__)
    return os.path.join(folder, "data")

def RSE(ypred, ytrue):
    rse = np.sqrt(np.square(ypred - ytrue).sum()) / \
            np.sqrt(np.square(ytrue - ytrue.mean()).sum())
    return rse

def quantile_loss(ytrue, ypred, qs):
    '''
    Quantile loss version 2
    Args:
    ytrue (batch_size, output_horizon)
    ypred (batch_size, output_horizon, num_quantiles)
    '''
    L = np.zeros_like(ytrue)
    for i, q in enumerate(qs):
        yq = ypred[:, :, i]
        diff = yq - ytrue
        L += np.max(q * diff, (q - 1) * diff)
    return L.mean()

def SMAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel()
    ypred = np.array(ypred).ravel() + 1e-4
    mean_y = (ytrue + ypred) / 2.
    return np.mean(np.abs((ytrue - ypred) \
        / mean_y))

def MAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel() + 1e-4
    ypred = np.array(ypred).ravel()
    return np.mean(np.abs((ytrue - ypred) \
        / ytrue))

def train_test_split(X, y, train_ratio=0.7):
    '''
    - X (array like): shape (num_samples, num_periods, num_features)
    - y (array like): shape (num_samples, num_periods)
    '''
    num_ts, num_periods, num_features = X.shape
    train_periods = int(num_periods * train_ratio)
    random.seed(2)
    Xtr = X[:, :train_periods, :]
    ytr = y[:, :train_periods]
    Xte = X[:, train_periods:, :]
    yte = y[:, train_periods:]
    return Xtr, ytr, Xte, yte

class StandardScaler:

    def fit_transform(self, y):
        self.mean = np.mean(y)
        self.std = np.std(y) + 1e-4
        return (y - self.mean) / self.std

    def inverse_transform(self, y):
        return y * self.std + self.mean

    def transform(self, y):
        return (y - self.mean) / self.std

class MaxScaler:

    def fit_transform(self, y):
        self.max = np.max(y)
        return y / self.max

    def inverse_transform(self, y):
        return y * self.max

    def transform(self, y):
        return y / self.max


class MeanScaler:

    def fit_transform(self, y):
        self.mean = np.mean(y)
        return y / self.mean

    def inverse_transform(self, y):
        return y * self.mean

    def transform(self, y):
        return y / self.mean

class LogScaler:

    def fit_transform(self, y):
        return np.log1p(y)

    def inverse_transform(self, y):
        return np.expm1(y)

    def transform(self, y):
        return np.log1p(y)


def gaussian_likelihood_loss(z, mu, sigma):
    '''
    Gaussian Liklihood Loss
    Args:
    z (tensor): true observations, shape (num_ts, num_periods)
    mu (tensor): mean, shape (num_ts, num_periods)
    sigma (tensor): standard deviation, shape (num_ts, num_periods)
    likelihood: 
    (2 pi sigma^2)^(-1/2) exp(-(z - mu)^2 / (2 sigma^2))
    log likelihood:
    -1/2 * (log (2 pi) + 2 * log (sigma)) - (z - mu)^2 / (2 sigma^2)
    '''
    negative_likelihood = torch.log(sigma + 1) + (z - mu) ** 2 / (2 * sigma ** 2) + 6
    return negative_likelihood.mean()

def negative_binomial_loss(ytrue, mu, alpha):
    '''
    Negative Binomial Sample
    Args:
    ytrue (array like)
    mu (array like)
    alpha (array like)
    maximuze log l_{nb} = log Gamma(z + 1/alpha) - log Gamma(z + 1) - log Gamma(1 / alpha)
                - 1 / alpha * log (1 + alpha * mu) + z * log (alpha * mu / (1 + alpha * mu))
    minimize loss = - log l_{nb}
    Note: torch.lgamma: log Gamma function
    '''
    batch_size, seq_len = ytrue.size()
    likelihood = torch.lgamma(ytrue + 1. / alpha) - torch.lgamma(ytrue + 1) - torch.lgamma(1. / alpha) \
        - 1. / alpha * torch.log(1 + alpha * mu) \
        + ytrue * torch.log(alpha * mu / (1 + alpha * mu))
    return - likelihood.mean()

def batch_generator(X, y, num_obs_to_train, seq_len, batch_size):
    '''
    Args:
    X (array like): shape (num_samples, train_periods, num_features)
    y (array like): shape (num_samples, train_periods)
    num_obs_to_train (int): 训练的历史窗口长度
    seq_len (int): sequence/encoder/decoder length
    batch_size (int)
    '''
    num_ts, num_periods, _ = X.shape
    if num_ts < batch_size:
        batch_size = num_ts
    t = random.choice(range(num_obs_to_train, num_periods-seq_len)) # 从num_obs_to_train和num_periods-seq_len-1之间随机选一个整数,作为预测点
    batch = random.sample(range(num_ts), batch_size) # 从num_ts条数据中随机选择batch_size条
    X_train_batch = X[batch, t-num_obs_to_train:t, :] # (batch_size, num_obs_to_train, num_features)
    y_train_batch = y[batch, t-num_obs_to_train:t] # (batch_size, num_obs_to_train)
    Xf = X[batch, t:t+seq_len, :] # (batch_size, seq_len, num_features)
    yf = y[batch, t:t+seq_len] # (batch_size, seq_len)
    return X_train_batch, y_train_batch, Xf, yf

Model

class TemporalPatternAttention(nn.Module):

    def __init__(self, filter_size, filter_num, attn_len, attn_size):
        super(TemporalPatternAttention, self).__init__()
        self.filter_size = filter_size # 1
        self.filter_num = filter_num
        self.feat_size = attn_size - self.filter_size + 1 # hidden_size
        self.conv = nn.Conv2d(1, filter_num, (attn_len, filter_size))
        self.linear1 = nn.Linear(attn_size, filter_num)
        self.linear2 = nn.Linear(attn_size + self.filter_num, attn_size)
        self.relu = nn.ReLU()

    def forward(self, H, ht): # H:(batch_size, 1, obs_len-1, hidden_size) ht:(batch_size, hidden_size)       
        _, channels, _, attn_size = H.size()

        conv_vecs = self.conv(H) # (batch_size, filter_num, 1, hidden_size)      
        conv_vecs = conv_vecs.view(-1, self.feat_size, self.filter_num) # (batch_size, hidden_size, filter_num)
        conv_vecs = self.relu(conv_vecs) # (batch_size, hidden_size, filter_num)

        # score function
        htt = self.linear1(ht) # (batch_size, filter_num) 
        htt = htt.view(-1, self.filter_num, 1) # (batch_size, filter_num, 1)
        s = torch.bmm(conv_vecs, htt) # (batch_size, hidden_size, 1)
        alpha = torch.sigmoid(s) # (batch_size, hidden_size, 1)
        v = torch.bmm(conv_vecs.view(-1,self.filter_num,attn_size), alpha).view(-1, self.filter_num) # (batch_size, filter_num)

        concat = torch.cat([ht, v], dim=1) # (batch_size, hidden_size+filter_num)
        new_ht = self.linear2(concat) # (batch_size, hidden_size)
        return new_ht

class TPALSTM(nn.Module):

    def __init__(self, input_size, output_horizon, hidden_size, obs_len, n_layers):
        super(TPALSTM, self).__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.lstm = nn.LSTM(hidden_size, hidden_size, n_layers, \
                    bias=True, batch_first=True) # output (batch_size, obs_len, hidden_size)
        self.hidden_size = hidden_size
        self.filter_num = 16
        self.filter_size = 1
        self.output_horizon = output_horizon
        self.attention = TemporalPatternAttention(self.filter_size, \
            self.filter_num, obs_len-1, hidden_size)
        self.linear = nn.Linear(hidden_size, output_horizon)
        self.n_layers = n_layers

    def forward(self, x):
        batch_size, obs_len, features_size = x.shape #(batch_size, obs_len, features_size)
        xconcat = self.hidden(x) #(batch_size, obs_len, hidden_size)

        H = torch.zeros(batch_size, obs_len-1, self.hidden_size).to(device) #(batch_size, obs_len-1, hidden_size)
        ht = torch.zeros(self.n_layers, batch_size, self.hidden_size).to(device) # (num_layers, batch_size, hidden_size)
        ct = ht.clone()
        for t in range(obs_len):
            xt = xconcat[:, t, :].view(batch_size, 1, -1) #(batch_size, 1, hidden_size)
            out, (ht, ct) = self.lstm(xt, (ht, ct)) # ht size (num_layers, batch_size, hidden_size)
            htt = ht[-1, :, :] # (batch_size, hidden_size)
            if t != obs_len - 1:
                H[:, t, :] = htt
        H = self.relu(H) #(batch_size, obs_len-1, hidden_size)

        # reshape hidden states H
        H = H.view(batch_size, 1, obs_len-1, self.hidden_size) #(batch_size, 1, obs_len-1, hidden_size)
        new_ht = self.attention(H, htt) # (batch_size, hidden_size)
        ypred = self.linear(new_ht) # (batch_size, output_horizon)
        return ypred

Load Data

num_epoches = 100
step_per_epoch = 3 #在一个epoch中，从训练集中提取step_per_epoch次训练数据
lr = 1e-3
n_layers = 1
hidden_size = 24
seq_len = 30 #预测的未来窗口长度
num_obs_to_train = 168  #训练的历史窗口长度
num_results_to_sample = 10
show_plot = True
run_test = True
standard_scaler = True
log_scaler = False
mean_scaler = False
max_scaler = False
batch_size = 128

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 读取数据
data = pd.read_csv("LD_MT200_hour.csv", parse_dates=["date"])
data["year"] = data["date"].apply(lambda x: x.year)
data["day_of_week"] = data["date"].apply(lambda x: x.dayofweek)
data = data.loc[(data["date"].dt.date >= date(2014, 1, 1)) & (data["date"].dt.date <= date(2014, 3, 1))]
print(data.shape)
plt.figure(figsize=(16, 4)) 
plt.plot(data['MT_200'])
data.head()

# 数据预处理
features = ["hour", "day_of_week"]
# hours = pd.get_dummies(data["hour"])
# dows = pd.get_dummies(data["day_of_week"])
years = data["year"]
hours = data["hour"]
dows = data["day_of_week"]
MT_200 = np.asarray(data["MT_200"]).reshape(-1,1)
yscaler1 = StandardScaler()
MT_200 = yscaler1.fit_transform(MT_200)
X = np.c_[np.asarray(hours),np.asarray(dows),np.asarray(MT_200)] #X:(len,features)
num_features = X.shape[1]
num_periods = len(data)
X = np.asarray(X).reshape((-1, num_periods, num_features))
y = np.asarray(data["MT_200"]).reshape((-1, num_periods))
print("X_shape=",X.shape) # (series_num,len,features_num)
print("y_shape=",y.shape) # (series_num,len)
# X = np.tile(X, (10, 1, 1))
# y = np.tile(y, (10, 1))

输出：
X_shape= (1, 1440, 3)
y_shape= (1, 1440)

def sliding_window(DataSet, width, multi_vector = True): #DataSet has to be as an Array
    if multi_vector: #三维 (num_samples,length,features)
        num_samples,length,features = DataSet.shape
    else: #二维 (num_samples,length)
        DataSet = DataSet[:,:,np.newaxis] #(num_samples,length,1)
        num_samples,length,features = DataSet.shape

    x = DataSet[:,0:width,:] #(num_samples,width,features)
    x = x[np.newaxis,:,:,:] #(1,num_samples,width,features)
    for i in range(length - width):
        i += 1
        tmp = DataSet[:,i:i + width,:]#(num_samples,width,features)
        tmp = tmp[np.newaxis,:,:,:] #(1,num_samples,width,features)
        x = np.concatenate([x,tmp],0) #(i+1,num_samples,width,features)
    return x

width = num_obs_to_train + seq_len 
X_data = sliding_window(X, width, multi_vector = True) #(len-width+1,num_samples,width,features)
Y_data = sliding_window(y, width, multi_vector = False) #(len-width+1,num_samples,width,1)
print("x的维度为：",X_data.shape)
print("y的维度为：",Y_data.shape)
# 取其中一类序列
i = 0
X_data = X_data[:,i,:,:]
Y_data = Y_data[:,i,:,0]
print("x的维度为：",X_data.shape)
print("y的维度为：",Y_data.shape)

输出：
x的维度为： (1243, 1, 198, 3)
y的维度为： (1243, 1, 198, 1)
x的维度为： (1243, 198, 3)
y的维度为： (1243, 198)

###### SPLIT TRAIN TEST
from sklearn.model_selection import train_test_split

Xtr, Xte, ytr, yte = train_test_split(X_data, Y_data, 
                                    test_size=0.2, 
                                    random_state=0,
                                    shuffle=False)
print("X_train:{},y_train:{}".format(Xtr.shape,ytr.shape))
print("X_test:{},y_test:{}".format(Xte.shape,yte.shape))

输出：
X_train:(994, 198, 3),y_train:(994, 198)
X_test:(249, 198, 3),y_test:(249, 198)

# 标准化
yscaler = None
if standard_scaler:
    yscaler = StandardScaler()
elif log_scaler:
    yscaler = LogScaler()
elif mean_scaler:
    yscaler = MeanScaler()
if yscaler is not None:
    ytr = yscaler.fit_transform(ytr.reshape(-1,1)).reshape(-1,seq_len+num_obs_to_train)

Xtr=torch.as_tensor(torch.from_numpy(Xtr), dtype=torch.float32)
ytr=torch.as_tensor(torch.from_numpy(ytr),dtype=torch.float32)     
Xte=torch.as_tensor(torch.from_numpy(Xte), dtype=torch.float32)
yte=torch.as_tensor(torch.from_numpy(yte),dtype=torch.float32)

print("X_train:{},y_train:{}".format(Xtr.shape,ytr.shape))
print("X_test:{},y_test:{}".format(Xte.shape,yte.shape))

train_dataset=torch.utils.data.TensorDataset(Xtr,ytr) #训练集dataset
train_Loader=torch.utils.data.DataLoader(train_dataset,batch_size=batch_size)

输出：
X_train:torch.Size([994, 198, 3]),y_train:torch.Size([994, 198])
X_test:torch.Size([249, 198, 3]),y_test:torch.Size([249, 198])

Train

Args:

X (array like): shape (num_samples, num_periods, num_features)
y (array like): shape (num_samples, num_periods)
epochs (int): number of epochs to run
step_per_epoch (int): steps per epoch to run
num_obs_to_train (int): The length of the history window for training
seq_len (int): output horizon
likelihood (str): what type of likelihood to use, default is gaussian
num_skus_to_show (int): how many skus to show in test phase
num_results_to_sample (int): how many samples in test phase as prediction

# 定义模型和优化器
num_ts, num_periods, num_features = X.shape
model = TPALSTM(input_size=Xtr.shape[2], output_horizon=seq_len, hidden_size=32, obs_len=num_obs_to_train, n_layers=1).to(device)
optimizer = Adam(model.parameters(), lr=lr)
random.seed(2)

losses = []
cnt = 0    

# training
print("开启训练")
progress = ProgressBar()
for epoch in progress(range(num_epoches)):
#     print("Epoch {} starts...".format(epoch))
    for x,y in train_Loader:
        x = x.to(device) # (batch_size, num_obs_to_train+seq_len, num_features) 
        y = y.to(device) # (batch_size, num_obs_to_train+seq_len)
        Xtrain = x[:,:num_obs_to_train,:].float() # (batch_size, num_obs_to_train, num_features)
        ytrain = y[:,:num_obs_to_train].float() # (batch_size, num_obs_to_train)
        Xf = x[:,-seq_len:,:].float() # (batch_size, seq_len, num_features)
        yf = y[:,-seq_len:].float() # (batch_size, seq_len)             

        ypred = model(Xtrain) # ypred:(batch_size, seq_len)

        loss = F.mse_loss(ypred, yf)

        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        cnt += 1

# 绘制loss
if show_plot:
    plt.plot(range(len(losses)), losses, "k-")
    plt.xlabel("Period")
    plt.ylabel("Loss")
    plt.show()

# test 
print("开启测试")
X_test_sample = Xte[:,:,:].reshape(-1,num_obs_to_train+seq_len,num_features).to(device) # (num_samples, num_obs_to_train+seq_len, num_features)
y_test_sample = yte[:,:].reshape(-1,num_obs_to_train+seq_len).to(device) # (num_samples, num_obs_to_train+seq_len)

X_test = X_test_sample[:,:num_obs_to_train,:] # (num_samples, num_obs_to_train, num_features)
Xf_test = X_test_sample[:, -seq_len:, :] # (num_samples, seq_len, num_features)
y_test = y_test_sample[:, :num_obs_to_train] # (num_samples, num_obs_to_train)
yf_test = y_test_sample[:, -seq_len:] # (num_samples, seq_len)

ypred = model(X_test)
ypred = ypred.cpu().detach().numpy()
if yscaler is not None:
    ypred = yscaler.inverse_transform(ypred.reshape(-1,1)).reshape(-1,seq_len)
# ypred = ypred.ravel()
yf_test = yf_test.cpu().detach().numpy()
loss = np.sqrt(np.sum(np.square(yf_test - ypred)))
print("losses: ", loss)

输出：
开启测试
losses:  11473.168

i = -1
if show_plot: # 序列总长度为：历史窗口长度(num_obs_to_train)+预测长度(seq_len)
    plt.figure(1, figsize=(20, 5))
    plt.plot([k + seq_len + num_obs_to_train - seq_len for k in range(seq_len)], ypred[i,:], "r-") # 绘制50%分位数曲线
    plt.title('Prediction uncertainty')
    yplot = y_test_sample[i,:].cpu() #真实值 (1, seq_len+num_obs_to_train)
    plt.plot(range(len(yplot)), yplot, "k-")
    plt.legend(["P50 forecast", "P10-P90 quantile", "true"], loc="upper left")
    ymin, ymax = plt.ylim()
    plt.vlines(seq_len + num_obs_to_train - seq_len, ymin, ymax, color="blue", linestyles="dashed", linewidth=2)
    plt.ylim(ymin, ymax)
    plt.xlabel("Periods")
    plt.ylabel("Y")
    plt.show()

深度学习 Pytorch 学习笔记时间序列 TPA

转载原出处：

打赏 0

上一篇：PaddleHub——口罩人脸检测及基于PaddleHub Serving部署

下一篇：【阅读笔记】概率预测之MQ-RNN(含Pytorch代码实现)

【阅读笔记】时间序列之TPA-LSTM(含Pytorch代码实现)

卡卡南安

一. 全文总结

二. 研究方法

三. 结论

四. 创新点

五. 思考

六. 参考文献

七. Pytorch实现

util(工具函数)

Model

Load Data

Train

为你推荐

（3D目标检测模型与方法综述）论文笔记Survey and Systematization of 3D Object Detection Models and Methods

Ubuntu 18.04安装CUDA 11.4.3和cuDNN 8.2.4

【目标检测】基于yolo3和sort的多目标检测与跟踪

在Jetson Nano上十行代码实现目标检测（jetson_inference）

Pytorch 分布式训练

利用PPOCRLabel标注工具大批量自动标注自己的数据集

评论（0）

关于作者

卡卡南安

47

0

452

3

OpenCV学习笔记(十八)——人脸识别

Webots学习笔记—距离传感器的介绍和四轮小车的避障

STM32——MPU9250原始数据转换(含源码)

相关推荐

深度强化学习专栏 —— 5. 使用ray的tune组件优化强化学习算法的超参数

MUNIT训练自己的数据集(图像风格转换)

机器学习笔记-决策树调参思路

语义分割综述

精选度学习pytorch实战六：ResNet50网络图像分类篇自建花数据集图像分类（5类）超详细代码

机器学习中的矩阵向量求导(一) 求导定义与求导布局

热门泡泡

30积分 失眠，聊聊自己搞ROS的心得体会吧

ros学习路线

30积分 TF_REPEATED_DATA ignoring data错误

各位大佬，有什么ROS定位算法推荐吗

5积分 想买能用ROS2的开发套件。或者开发板

5积分 ros中启动gazebo时报错

TA的专栏

OpenCV学习笔记

给作者打赏

忘记密码

修改头像

添加你感兴趣的标签

举报类型（必选）

举报详情（选填）

30积分失眠，聊聊自己搞ROS的心得体会吧

5积分想买能用ROS2的开发套件。或者开发板