๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

์ฝ”๋กœ๋‚˜ ํ™•์ง„ ์˜ˆ๋ฐฉ์„ ์œ„ํ•ด ์‹œ๊ณ„์—ด(Time-Series) ๋ฐ์ดํ„ฐ๋กœ LSTM ์˜ˆ์ธก ๋ชจ๋ธ๋งŒ๋“ค๊ธฐ ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

์ฝ”๋กœ๋‚˜ ํ™•์ง„ ์˜ˆ๋ฐฉ์„ ์œ„ํ•ด ์‹œ๊ณ„์—ด(Time-Series) ๋ฐ์ดํ„ฐ๋กœ LSTM ์˜ˆ์ธก ๋ชจ๋ธ๋งŒ๋“ค๊ธฐ

์ง•์ง•์•ŒํŒŒ์นด 2022. 10. 26. 14:46
728x90
๋ฐ˜์‘ํ˜•

221026 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” data-panic ๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค>

https://data-panic.tistory.com/33

 

 

๐Ÿ“ ์ฝ”๋กœ๋‚˜ ํ™•์ง„ ์˜ˆ๋ฐฉ

  • ํ•ด์™ธ์œ ์ž…ํ™•์ง„์ž์— ๋Œ€ํ•œ ์‹œ๊ณ„์—ด(Time-Series) ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ
  • ๊ฐ€๊นŒ์šด ๋ฏธ๋ž˜์— ๋ฐœ์ƒํ•˜๋Š” ํ•ด์™ธ์œ ์ž… ์‚ฌ๋ก€๋ฅผ ์˜ˆ์ธก
  • 14์ผ์˜ ๋ฏธ๋ž˜๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ํ”„๋กœ์ ํŠธ์˜ ๋ชฉํ‘œ
  • ๋ชจ๋ธ๋ง์—๋Š” PyTorch ๊ธฐ๋ฐ˜ LSTM ๋ชจ๋ธ

 

 

๐Ÿ“ ์ฝ”๋“œ ๋ฆฌ๋ทฐ

1๏ธโƒฃ Load libraries

import torch
import os
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import mean_squared_error
from pandas.plotting import register_matplotlib_converters
from torch import nn, optim

%matplotlib inline
%config InlineBackend.figure_format='retina'

sns.set(style='whitegrid', palette='muted', font_scale=1.2)

HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#93D30C", "#8F00FF"]

sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))

rcParams['figure.figsize'] = 14, 10
register_matplotlib_converters()

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

import warnings
warnings.filterwarnings('ignore')

from matplotlib import font_manager, rc

 

 

2๏ธโƒฃ

df = pd.read_csv('final_0507.csv')
df.drop(["Unnamed: 0"], axis = 1, inplace = True)
df
  • date setting
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)
df

  • ๋ณ€์ˆ˜๋ช…
    • Date : ๋‚ ์งœ(index)
    • ๊ตญ๊ฐ€์ฝ”๋“œ_conf : ํ•ด๋‹น ๊ตญ๊ฐ€์˜ ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜
    • ๊ตญ๊ฐ€์ฝ”๋“œ_roam : ํ•ด๋‹น ๊ตญ๊ฐ€๋กœ ๋ถ€ํ„ฐ ํ•œ๊ตญ์œผ๋กœ ๋“ค์–ด์˜จ ์ผ๋ณ„ ๋กœ๋ฐ ์ด์šฉ์ž ์ˆ˜
    • KR : ๊ตญ๋‚ด ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜ (์ง€์—ญ์‚ฌํšŒ)
    • news : ์ฝ”๋กœ๋‚˜ ๊ด€๋ จ ํ•ด์™ธ ๋‰ด์Šค ์ผ๋ณ„ ๊ฐฏ์ˆ˜
    • covid_tr : 'covid' ํ‚ค์›Œ๋“œ๋กœ ๊ฒ€์ƒ‰ํ•œ ๊ตฌ๊ธ€ ํŠธ๋ Œ๋“œ ์ง€์ˆ˜
    • coro_tr : 'corona' ํ‚ค์›Œ๋“œ๋กœ ๊ฒ€์ƒ‰ํ•œ ๊ตฌ๊ธ€ ํŠธ๋ Œ๋“œ ์ง€์ˆ˜
    • target(ํ•ด์™ธ์œ ์ž…ํ™•์ง„์ž)
lag_col= list(df.columns)
lag_col

  • ๋ชจ๋“  ๋ณ€์ˆ˜์— ์‹œ์ฐจ(LAG) ์ƒ์„ฑ
    • ๊ฐ ๋ณ€์ˆ˜๋“ค์— 3๊ฐœ์˜ ์‹œ์ฐจ ๋ณ€์ˆ˜ ๋งŒ๋“ค๊ณ  ์ƒ์„ฑ์œผ๋กœ ์ธํ•œ NAN ๊ฐ’์€ ํ†ต์งธ๋กœ ๋‚ ๋ฆฌ๊ธฐ
lag_amount = 3

for col in lag_col:
    for i in range(lag_amount):
        df['{0}_lag{1}'.format(col,i+1)] = df['{}'.format(col)].shift(i+1)
    
df.dropna(inplace=True)
df

 

 

3๏ธโƒฃ Data exploration

print("total shape: {}".format(df.shape))
print("target feature shape: {}".format(df['target'].shape))

plt.figure(figsize=(25,5))
plt.plot(df['target'])
plt.xticks(rotation=90)
plt.title("Oversea Inflow Cofirmed")
plt.grid(axis='x')

์ด๋ ‡๊ฒŒ ์ƒ๊ฒผ๊ตฐ

 

4๏ธโƒฃ LSTM ๋ชจ๋ธ

X_cols = list(df.columns)
X_cols.remove('target')

 

  •  X, y์— ์Šค์ผ€์ผ๋ง
  • Scikit-learn์˜ MinMaxScaler๋ฅผ ์‚ฌ์šฉ
    • ์Šค์ผ€์ผ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”ํ›„์— ๋‹ค์‹œ inverse scale ํ•ด์ฃผ๊ธฐ ์œ„ํ•ด X ๋ฐ์ดํ„ฐ์™€ y๋ฐ์ดํ„ฐ ๊ฐ๊ฐ ์Šค์ผ€์ผ๋Ÿฌ๋ฅผ ๋งŒ๋“ค์–ด ์ ์šฉ
    • ๊ทธ ํ›„์— train / test ์…‹์„ ๊ตฌ๋ถ„
    • lstm sequence๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ธฐ ์œ„ํ•ด y๋ฐ์ดํ„ฐ๋ฅผ flatten()ํ•˜์—ฌ ์ฐจ์›์„ ์ค„์ด๊ธฐ
# MinMaxScaler ์Šค์ผ€์ผ๋ง
scaler = MinMaxScaler()

Xscaler = scaler.fit(X)
yscaler = scaler.fit(y.values.reshape(-1,1))
# ์Šค์ผ€์ผ๋ง ์ ์šฉ
X = Xscaler.fit_transform(X)
y = yscaler.fit_transform(y.values.reshape(-1,1))
# Train, Test set split
X_train, X_test = X[:-test_data_size], X[-test_data_size:]
y_train, y_test = y[:-test_data_size].flatten(), y[-test_data_size:].flatten()
print("train set : ", X_train.shape)
print("test set : ", X_test.shape)

 

  • LSTM ์„ ์œ„ํ•œ ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ ํ˜•์„ฑ ํ•จ์ˆ˜
    • ๋ชจ๋ธ ์•ˆ์— ๋“ค์–ด๊ฐˆ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œํ€€์Šค ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด ์ฃผ๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜
def create_sequences1(array, seq_length):
    res = []
    if seq_length == 1:
        for i in range(len(array)):
            tmp=array[i:(i+seq_length)]
            res.append(tmp)
    else:
        for i in range(len(array)-seq_length-1):
            tmp = array[i:(i+seq_length)]
            res.append(tmp)
    return res

 

โœ” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œํ€€์Šคํ˜•ํƒœ๋กœ ๋งŒ๋“ฆ
โœ” 5๊ฐœ ์”ฉ ํ•œ ์‹œํ€€์Šค๋กœ ๋ฌถ์—ˆ์„ ๊ฒฝ์šฐX_train์„ ๋ณด๋ฉด ํ•œ 1๊ฐœ์˜ array์— 5๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด๊ฐ€ ์žˆ์Œ
= 1์›” 22์ผ ๋ถ€ํ„ฐ 1์›” 26์ผ๊นŒ์ง€์˜ X ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜๋กœ ๋ฌถ์—ฌ์„œ ๋ชจ๋ธ๋กœ ๋“ค์–ด๊ฐ€๋Š” ๊ตฌ์กฐ
โœ” ์‹œํ€€์Šค ํ˜•ํƒœ๋กœ ๋งŒ๋“œ๋Š” ์ด์œ ๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•จ
seq_length = 5

X_train = create_sequences1(X_train, seq_length)
y_train = create_sequences1(y_train, seq_length)
X_test = create_sequences1(X_test, seq_length)
y_test = create_sequences1(y_test, seq_length)
X_train[:3]

 

  • ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ค์ง€ ์•Š๊ณ  ์‚ฌ์šฉํ•˜๋ ค๋ฉด seq_length๋ฅผ 1
    • ๋ชฉํ‘œํ–ˆ๋˜ ๊ฒƒ์€ 14์ผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ
    • ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ฆ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ ๋˜ํ•œ ๋” ํ•„์š”
    • ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” ํ•œ์ •๋˜์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ค์ง€ ์•Š๊ณ  ์‚ฌ์šฉ
seq_length = 1

X_train = create_sequences1(X_train, seq_length)
y_train = create_sequences1(y_train, seq_length)
X_test = create_sequences1(X_test, seq_length)
y_test = create_sequences1(y_test, seq_length)
X_train[:3]

 

  • PyTorch ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด torch.tensor๋กœ ๋ณ€ํ™˜
# numpy๋ฅผ tensor๋กœ ๋ณ€ํ™˜
X_train = torch.tensor(X_train).float()
y_train = torch.tensor(y_train).float()
X_test = torch.tensor(X_test).float()
y_test = torch.tensor(y_test).float()
print("X_train :",(X_train.shape))
print("X_test :",(X_test.shape))
print("y_train :",(y_train.shape))
print("y_test :",(y_test.shape))

 

 

5๏ธโƒฃ LSTM ๋ชจ๋ธ ์ƒ์„ฑ

  • LSTM๊ณผ Linear๋กœ ๊ตฌ์„ฑ
    • num_layers๋กœ ๋ ˆ์ด์–ด์ธต์˜ ๊ฐฏ์ˆ˜๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ
    • ๋ฐ์ดํ„ฐ๊ฐ€ ์ ๊ณ  ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํฌ๊ฑฐ๋‚˜ ๊นŠ์ง€ ์•Š์œผ๋ฏ€๋กœ dropout์€ ๋”ฐ๋กœ ์ฃผ์ง€ ์•Š์Œ
# ๋ชจ๋ธ Clss ์ƒ์„ฑ

class CoronaVirusPredictor(nn.Module):
    
    def __init__(self, n_features, n_hidden, seq_len, n_layers=2):
        super(CoronaVirusPredictor, self).__init__()
        self.n_hidden = n_hidden
        self.seq_len = seq_len
        self.n_layers = n_layers

        self.lstm = nn.LSTM(
        input_size = n_features,
        hidden_size = n_hidden,
        num_layers = n_layers,
        #dropout=0.1
        )
        
        self.linear = nn.Linear(in_features=n_hidden, out_features=1)
        
    def reset_hidden_state(self):
            self.hidden = (
                torch.zeros(self.n_layers, self.seq_len, self.n_hidden),
                torch.zeros(self.n_layers, self.seq_len, self.n_hidden))
            
    def forward(self, sequences):
        lstm_out, self.hidden = self.lstm(sequences.view(len(sequences), self.seq_len, -1), self.hidden)
        last_time_step = lstm_out.view(self.seq_len, len(sequences), self.n_hidden)[-1]
        y_pred = self.linear(last_time_step)
    
        return y_pred

 

6๏ธโƒฃ ๋ชจ๋ธ ํ›ˆ๋ จ

  • epoch๊ณผ learning rate๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ
  • loss function์œผ๋กœ๋Š” MSELoss
  • optimizer๋กœ Adam
    • optimizer์— weight_decay๋ฅผ ์„ค์ •
  • 10 epoch ๋งˆ๋‹ค train๊ณผ test์˜ loss๋ฅผ ์ถœ๋ ฅ
def train_model(model, train_data, train_labels, test_data=None, test_labels=None, num_epochs=250, lr=1e-3):
    loss_fn = torch.nn.MSELoss()
    
    optimiser = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
    num_epochs = num_epochs
    
    train_hist = np.zeros(num_epochs)
    test_hist = np.zeros(num_epochs)
    
    for t in range(num_epochs):
        model.reset_hidden_state()
        y_pred = model(X_train)
        loss = loss_fn(y_pred.float(), y_train)
        
        if test_data is not None:
            with torch.no_grad():
                y_test_pred = model(X_test)
                test_loss = loss_fn(y_test_pred.float(), y_test)
            test_hist[t] = test_loss.item()
            
            if t % 10 == 0:
                print(f'Epoch {t} train loss: {round(loss.item(),4)} test loss: {round(test_loss.item(),4)}')
        elif t % 10 == 0:
            print(f'Epoch {t} train loss: {loss.item()}')
            
        train_hist[t] = loss.item()
        optimiser.zero_grad()
        loss.backward()
        optimiser.step()
        
    return model.eval(), train_hist, test_hist
# ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

n_features=X_train.shape[-1]
n_hidden=64
n_layers=4
lr=1e-4
num_epochs=200
# Training Model
model = CoronaVirusPredictor(n_features=n_features, n_hidden=n_hidden, seq_len=seq_length, n_layers=n_layers)
model, train_hist, test_hist = train_model(model, X_train, y_train, X_test, y_test, num_epochs=num_epochs, lr=lr)

  • ํ•™์Šต ๊ฒฐ๊ณผ
    • loss๊ฐ€ ์ˆ˜๋ ดํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด๋ฉด ์ƒ๋‹นํžˆ ๋น„์ •์ƒ์ ์ธ ๊ฒƒ
    • ์šฐ์„  train๋ณด๋‹ค test์˜ loss๊ฐ€ ๋” ๋‚ฎ์Œ
      • 1) train data๊ฐ€ ๋„ˆ๋ฌด ์–ด๋ ต๊ฑฐ๋‚˜ test data๊ฐ€ ๋„ˆ๋ฌด ์‰ฌ์šธ ๊ฒฝ์šฐ์— train loss ๋ณด๋‹ค test loss๊ฐ€ ๋‚ฎ๊ฒŒ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Œ
      •  2) 100์—ฌ๊ฐœ ์ •๋„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๋Œ๋ ธ์œผ๋‹ˆ ์‚ฌ์‹ค ์ •์ƒ์ ์ธ ํ•™์Šต์€ ์•„๋‹˜ -> ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ์ ์Œ
# plotting Loss
plt.plot(train_hist, label="Training loss")
plt.plot(test_hist, label="Test loss")
plt.title('n_features:{0}, n_hidden:{1}, n_layers:{2}, lr:{3}, seq_length:{4}, num_epochs:{5}'.format(n_features,n_hidden,n_layers,lr,seq_length,num_epochs))
plt.legend()

 

 

7๏ธโƒฃ ์ผ์ผ ์ผ€์ด์Šค ์˜ˆ์ธก

with torch.no_grad():
    
    preds = []
    for i in range(len(X_test)):
        test_seq = X_test[i:i+1]
        y_test_pred = model(test_seq)
        pred = torch.flatten(y_test_pred).item()
        preds.append(pred)
        new_seq = test_seq.numpy().flatten()
        new_seq = np.append(new_seq, pred)
        new_seq = new_seq[1:]
        test_seq = torch.as_tensor(new_seq).view(n_features, seq_length, 1).float()

preds

 

  • X_test ๊ฐ’์„ ๋ชจ๋ธ์— ๋„ฃ์–ด ์˜ˆ์ธก๊ฐ’ preds๋ฅผ ์‚ฐ์ถœ
  • ์†Œ์ˆ˜์ ์˜ ๊ฒฐ๊ณผ๊ฐ’๋“ค์ด ๋‚˜์˜ค๋Š”๋ฐ ์ด๊ฑด ์œ„์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์ผ€์ผ๋ง ํ•ด์คฌ๊ธฐ ๋•Œ๋ฌธ
# Prediction value ์—ญ๋ณ€ํ™˜
pred_values = yscaler.inverse_transform(np.array(preds).reshape(-1,1))
- X_test ๊ฐ’์„ ๋ชจ๋ธ์— ๋„ฃ์–ด ์˜ˆ์ธก๊ฐ’ pr
eds๋ฅผ ์‚ฐ์ถœ
- ์†Œ์ˆ˜์ ์˜ ๊ฒฐ๊ณผ๊ฐ’๋“ค์ด ๋‚˜์˜ค๋Š”๋ฐ ์ด๊ฑด ์œ„์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์ผ€์ผ๋ง ํ•ด์คฌ๊ธฐ ๋•Œ๋ฌธ
pred_values_ceiled  = list(pred_values.flatten())
# True value ์—ญ๋ณ€ํ™˜
true_values = yscaler.inverse_transform(y_test)[:, [-1]]
# ์‹ค์ œ๊ฐ’ ์˜ˆ์ธก๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ƒ์„ฑ
score_table = pd.DataFrame({'True':true_values.flatten(),
                            'Pred':pred_values_ceiled})

 

  • ์‹ค์ œ y๊ฐ’ 'True'์™€ ๋ชจ๋ธ๋กœ ๋ถ€ํ„ฐ ๋‚˜์˜จ ์˜ˆ์ธก๊ฐ’ 'Pred'๋กœ ๊ตฌ์„ฑ๋œ score_table
  • 4์›” 22์ผ ๋ถ€ํ„ฐ 5์›”5์ผ๊นŒ์ง€์˜ ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’

 

  • MSE์™€ RMSE๋ฅผ ์ƒ์„ฑ
  • score๋Š” ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ์ฐจ์ด๊ฐ€ ์ž‘์„ ์ˆ˜๋ก 100์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ์ ์ˆ˜
# validation score
MSE = mean_squared_error(score_table['True'], score_table['Pred'])
RMSE = np.sqrt(MSE)
score = 100*(1-(((score_table['Pred'] -score_table['True'])**2).sum())/((score_table['True']**2).sum()))
print("MSE : {0}, RMSE : {1}, SCORE : {2}".format(MSE, RMSE, score))

 

  • ๋…ธ๋ž€์„ ์ด ์‹ค์ œ๊ฐ’์ด๊ณ  ๋นจ๊ฐ„์„ ์ด ์˜ˆ์ธก๊ฐ’
  • ์‹ค์ œ ํ•ด์™ธ์œ ์ž…ํ™•์ง„์ž์˜ ๊ฒฝ์šฐ์—๋Š” ์ผ์ผ๋ณ„ ๊ตด๊ณก๋“ค์ด ์žˆ์Œ
plt.figure(figsize=(10,5))
plt.plot(range(y_train.__len__()),yscaler.inverse_transform(y_train)[:, [-1]])
plt.plot(range(y_train.__len__(), y_train.__len__()+y_test.__len__()),true_values, label='Real')
plt.plot(range(y_train.__len__(), y_train.__len__()+y_test.__len__()),pred_values_ceiled, label='Pred')
#plt.xlim(70)
plt.legend()

 

 

  • PyTorch์˜ ๋ชจ๋ธ ํ™•์žฅ์ž ํ˜•์‹์ธ .pth๋กœ ์ €์žฅ
  • ๋ชจ๋ธ์˜ ํŒŒ์ผ๋ช…์€ ์‚ฌ์šฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ์ ์ˆ˜๋ฅผ ๋„ฃ์–ด ์–ด๋–ค ๋ชจ๋ธ์ด์—ˆ๋Š”์ง€ ๊ตฌ๋ถ„ ๊ฐ€๋Šฅ
# ๋ชจ๋ธ ์ €์žฅ
PATH = './{6}_n_features_{0}_n_hidden_{1}_n_layers_{2}_lr_{3}_seq_length_{4}_num_epochs_{5}.pth'.format(n_features,n_hidden,n_layers,lr,seq_length,num_epochs, score.round(2))

torch.save(model, PATH)

 

 

8๏ธโƒฃ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฏธ๋ž˜ ์˜ˆ์ธก

  • ๋™์ผํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌ
  • Train, Test๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ์•„๋‹Œ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉ
X_all = df[X_cols]
y_all = df['target']
# MinMaxScaler ์Šค์ผ€์ผ๋ง
scaler = MinMaxScaler()
# X scaler 
Xscaler = scaler.fit(X_all)
# Y scaler 
yscaler = scaler.fit(y_all.values.reshape(-1,1))

# ์Šค์ผ€์ผ๋ง ์ ์šฉ
X_all = Xscaler.fit_transform(X_all)
y_all = yscaler.fit_transform(y_all.values.reshape(-1,1))
y_all = y_all.flatten()

print("X_all : ", X_all.shape)
print("y_all : ", y_all.shape)

X_all = create_sequences1(X_all, seq_length)
y_all = create_sequences1(y_all, seq_length)
X_all = torch.from_numpy(np.array(X_all)).float()
y_all = torch.from_numpy(np.array(y_all)).float()

 

  • DAYS_TO_PREDICT๋Š” ์˜ˆ์ธกํ•  ๋‚ ์งœ์˜ ์ˆ˜
  • 14์ผ์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์˜€๊ธฐ ๋•Œ๋ฌธ์— 14๋กœ ์„ค์ •
DAYS_TO_PREDICT = 14

with torch.no_grad():
    test_seq = X_all[:1]
    preds = []
    for _ in range(DAYS_TO_PREDICT):
        y_test_pred = model(test_seq)
        pred = torch.flatten(y_test_pred).item()
        preds.append(pred)
        new_seq = test_seq.numpy().flatten()
        new_seq = np.append(new_seq, [pred])
        new_seq = new_seq[1:]
pred_values = yscaler.inverse_transform(np.array(preds).reshape(-1,1))

 

  • 14์ผ ์น˜์˜ ๋ฏธ๋ž˜ ์˜ˆ์ธก๊ฐ’
  • 4๋ช…์—์„œ 1๋ช…๊นŒ์ง€ ๋–จ์–ด์ง
  • ๋ชจ๋ธ์ด ์‹ค์ œ๋กœ ํ•ด์™ธ์œ ์ž…ํ™•์ง„์ž์˜ ๊ฐ์†Œ๋ฅผ ํ•™์Šต์„ ํ•œ๊ฑด์ง€ ์•„๋‹ˆ๋ฉด ๋”ฑํžˆ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๊ทผ๊ฑฐ๋‚˜ ํž˜์ด ์—†์–ด์„œ ๋–จ์–ด์งˆ๊ฑฐ๋ผ๊ณ  ์˜ˆ์ธกํ•œ๊ฑด์ง€๋Š” ์•Œ ์ˆ˜๊ฐ€ ์—†์Œ
import math

pred_values_ceiled = list(pred_values.flatten())
predicted_cases=pred_values_ceiled
predicted_cases

 

predicted_index = pd.date_range(
  start=df.index[-1],
  periods=DAYS_TO_PREDICT + 1,
  closed='right'
)
predicted_index = pd.to_datetime(predicted_index, format='%Y%m%d')

predicted_cases = pd.Series(
  data=predicted_cases,
  index=predicted_index
)

plt.plot(predicted_cases, label='Predicted Daily Cases')
plt.legend();

 

preds_ = pd.DataFrame(predicted_cases)
df.index = pd.to_datetime(df.index)
plt.figure(figsize=(25,5))
plt.plot(df['target'].astype(int), label='Historical Daily Cases')
plt.plot(preds_, label='Predicted Daily Cases')
plt.xticks(rotation=90)
plt.title("Oversea Inflow Cofirmed")
plt.grid(axis='x')
plt.legend()

 

728x90
๋ฐ˜์‘ํ˜•
Comments