๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

LSTM(+GRU)์„ ์ด์šฉํ•œ ์‚ผ์„ฑ์ „์ž(+NAVER) ์ฃผ๊ฐ€ ์˜ˆ์ธกํ•˜๊ธฐ ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

LSTM(+GRU)์„ ์ด์šฉํ•œ ์‚ผ์„ฑ์ „์ž(+NAVER) ์ฃผ๊ฐ€ ์˜ˆ์ธกํ•˜๊ธฐ

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 22. 16:19
728x90
๋ฐ˜์‘ํ˜•

220922 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” coding-yoon ๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://coding-yoon.tistory.com/131

 

[Pytorch] LSTM์„ ์ด์šฉํ•œ ์‚ผ์„ฑ์ „์ž ์ฃผ๊ฐ€ ์˜ˆ์ธกํ•˜๊ธฐ

์•ˆ๋…•ํ•˜์„ธ์š”. ์˜ค๋Š˜์€ LSTM์„ ์ด์šฉํ•ด์„œ ์‚ผ์„ฑ์ „์ž ์ฃผ๊ฐ€๋ฅผ ์˜ˆ์ธกํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํฐ Dataset์€ ๋”ฐ๋กœ ํ•„์š”ํ•˜์ง€ ์•Š์œผ๋‹ˆ ๋ถ€๋‹ด ๊ฐ–์ง€ ์•Š๊ณ  ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋ณธ๋ฌธ ๊ธ€์ž…๋‹ˆ๋‹ค. cnvrg.io/pytorch-lstm/?gclid=C

coding-yoon.tistory.com

 

 

 

 

 

 

 

1๏ธโƒฃ library load

  • pandas_datareader
    • Yahoo Finance์—์„œ ์ฆ์‹œ ์ž๋ฃŒ๋ฅผ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ
    • ์›น ์ƒ์˜ ๋ฐ์ดํ„ฐ๋ฅผ DataFrame ๊ฐ์ฒด๋กœ ๋งŒ๋“œ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณต
import numpy as np
import pandas as pd
import pandas_datareader.data as pdr
import matplotlib.pyplot as plt

import datetime

import torch
import torch.nn as nn
from torch.autograd import Variable 

import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

 

 

 

 

 

 

 

2๏ธโƒฃ ์‚ผ์„ฑ ์ „์ž ์ฃผ์‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

  • ์‚ผ์„ฑ์ „์ž์˜ ์ข…๋ชฉ์ฝ”๋“œ๋Š” 005930
start = (2000, 1, 1)  # 2020๋…„ 01๋…„ 01์›” 
start = datetime.datetime(*start)  
end = datetime.date.today()  # ํ˜„์žฌ 

# yahoo ์—์„œ ์‚ผ์„ฑ ์ „์ž ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
df = pdr.DataReader('005930.KS', 'yahoo', start, end)
df.Close.plot(grid=True)

2022.09.22 ๊ธฐ์ค€์œผ๋กœ ์š”์ฆ˜ ์ฃผ์‹ ์žฅ ํŒŒ๋ž€๋ถˆ~์ด๋‹ค ใ… ใ… ใ… ใ… ๋‚ด๋ฆผ์ถ”์„ธใ„ด

 

 

 

3๏ธโƒฃ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ•™์Šตํ•˜๊ธฐ์œ„ํ•ด ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

  • open ์‹œ๊ฐ€
  • high ๊ณ ๊ฐ€
  • low ์ €๊ฐ€
  • close ์ข…๊ฐ€
  • volume ๊ฑฐ๋ž˜๋Ÿ‰ (ํ•„์š” ์—†์–ด์„œ DROP)
  • Adj Close ์ฃผ์‹์˜ ๋ถ„ํ• , ๋ฐฐ๋‹น, ๋ฐฐ๋ถ„ ๋“ฑ์„ ๊ณ ๋ คํ•ด ์กฐ์ •ํ•œ ์ข…๊ฐ€ (์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” TARGET)
X = df.drop(columns='Volume')
y = df.iloc[:, 5:6]

print(X)
print(y)

 

 

4๏ธโƒฃ ํ•™์Šต์ด ์ž˜๋˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๋ฐ split

  • StandardScaler : ๊ฐ ํŠน์ง•์˜ ํ‰๊ท ์„ 0, ๋ถ„์‚ฐ์„ 1์ด ๋˜๋„๋ก ๋ณ€๊ฒฝ
  • MinMaxScaler : ์ตœ๋Œ€/์ตœ์†Œ๊ฐ’์ด ๊ฐ๊ฐ 1, 0์ด ๋˜๋„๋ก ๋ณ€๊ฒฝ
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

mm = MinMaxScaler()
ss = StandardScaler()

X_ss = ss.fit_transform(X)
y_mm = mm.fit_transform(y) 

X_train, X_test, y_train, y_test = train_test_split(X_ss, y_mm, test_size=0.2, shuffle=True, random_state=34)

print("Training Shape", X_train.shape, y_train.shape)
print("Testing Shape", X_test.shape, y_test.shape)

 

 

  • ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด Torch๋กœ ๋ณ€ํ™˜
    • Torch์˜ Variable
      • data : a์˜ tensor ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ด๊น€
      • grad : data๊ฐ€ ๊ฑฐ์ณ์˜จ layer์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์ด ์ถ•์ 
      • grad_fn : ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•œ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ •๋ณด
X_train_tensors = Variable(torch.Tensor(X_train))
X_test_tensors = Variable(torch.Tensor(X_test))

y_train_tensors = Variable(torch.Tensor(y_train))
y_test_tensors = Variable(torch.Tensor(y_test))

X_train_tensors_final = torch.reshape(X_train_tensors,   (X_train_tensors.shape[0], 1, X_train_tensors.shape[1]))
X_test_tensors_final = torch.reshape(X_test_tensors,  (X_test_tensors.shape[0], 1, X_test_tensors.shape[1])) 

print("Training Shape", X_train_tensors_final.shape, y_train_tensors.shape)
print("Testing Shape", X_test_tensors_final.shape, y_test_tensors.shape)

 

5๏ธโƒฃ LSTM ๋„คํŠธ์›Œํฌ

  • RNN (default)
    • RNN์˜ ์ž…๋ ฅ : [sequence, batch_size, input_size] 
    • nn.RNN : ๊ธฐ๋ณธ ์ธ์ž ๊ฐ’์œผ๋กœ input_size, hidden_size (nn.RNN์„ ๊ฑธ์น˜๋ฉด ๋‚˜์˜ค๋Š” Output), num_layer (์ธต)
      • output
        • output[-1] : [sequence, batch_size, hidden_size]  ์—์„œ Sequence์˜ ๊ฐ€์žฅ ๋
        • hidden[-1] : [num_layer, batch_size, hidden_size]  ์—์„œ num_layer์˜ ๊ฐ€์žฅ ๋

 

https://brunch.co.kr/@linecard/324

 

  • GPU ์—†์–ด์„œ CPU
device = torch.device('cpu')
class LSTM1(nn.Module):
  def __init__(self, num_classes, input_size, hidden_size, num_layers, seq_length):
    super(LSTM1, self).__init__()
    self.num_classes = num_classes #number of classes
    self.num_layers = num_layers #number of layers
    self.input_size = input_size #input size
    self.hidden_size = hidden_size #hidden state
    self.seq_length = seq_length #sequence length
 
    self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                      num_layers=num_layers, batch_first=True) #lstm
    self.fc_1 =  nn.Linear(hidden_size, 128) #fully connected 1
    self.fc = nn.Linear(128, num_classes) #fully connected last layer

    self.relu = nn.ReLU() 

  def forward(self,x):
    h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #hidden state
    c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #internal state   
    # Propagate input through LSTM

    output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
   
    hn = hn.view(-1, self.hidden_size) #reshaping the data for Dense layer next
    out = self.relu(hn)
    out = self.fc_1(out) #first Dense
    out = self.relu(out) #relu
    out = self.fc(out) #Final Output
   
    return out

 

  • ๋„คํŠธ์›Œํฌ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ตฌ์„ฑ
num_epochs = 30000 #1000 epochs
learning_rate = 0.00001 #0.001 lr

input_size = 5 #number of features
hidden_size = 2 #number of features in hidden state
num_layers = 1 #number of stacked lstm layers

num_classes = 1 #number of output classes
lstm1 = LSTM1(num_classes, input_size, hidden_size, num_layers, X_train_tensors_final.shape[1]).to(device)

loss_function = torch.nn.MSELoss()    # mean-squared error for regression
optimizer = torch.optim.Adam(lstm1.parameters(), lr=learning_rate)  # adam optimizer

 

6๏ธโƒฃ ํ•™์Šต

for epoch in range(num_epochs):
  outputs = lstm1.forward(X_train_tensors_final.to(device)) #       forward pass
  optimizer.zero_grad()     # caluclate the gradient, manually setting to 0
 
  # obtain the loss function
  loss = loss_function(outputs, y_train_tensors.to(device))

  loss.backward() #calculates the loss of the loss function
 
  optimizer.step() #improve from loss, i.e backprop

  if epoch % 100 == 0:
    print("Epoch: %d, loss: %1.5f" % (epoch, loss.item()))

ใ…‹ใ…‹.. 0.00001

 

num_epochs = 10000 #1000 epochs  ์œผ๋กœ ํ•˜๋ฉด

 

 

 

7๏ธโƒฃ ์˜ˆ์ธก

  • ์œ„ ์ฒ˜๋Ÿผ ๋ณ€ํ™˜~
df_X_ss = ss.transform(df.drop(columns='Volume'))
df_y_mm = mm.transform(df.iloc[:, 5:6])

df_X_ss = Variable(torch.Tensor(df_X_ss)) #converting to Tensors
df_y_mm = Variable(torch.Tensor(df_y_mm))

#reshaping the dataset
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))

  • ๋นจ๊ฐ„ ์„  ์ดํ›„๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฒƒ...
train_predict = lstm1(df_X_ss.to(device))#forward pass
data_predict = train_predict.data.detach().cpu().numpy() #numpy conversion
dataY_plot = df_y_mm.data.numpy()

data_predict = mm.inverse_transform(data_predict) #reverse transformation
dataY_plot = mm.inverse_transform(dataY_plot)
plt.figure(figsize=(10,6)) #plotting
plt.axvline(x=4563, c='r', linestyle='--') #size of the training set

plt.plot(dataY_plot, label='Actuall Data') #actual plot
plt.plot(data_predict, label='Predicted Data') #predicted plot
plt.title('Time-Series Prediction')
plt.legend()
plt.show()

 

 

 

 

+ GRU Model

class GRU(nn.Module) :
    def __init__(self, num_classes, input_size, hidden_size, num_layers, seq_length) :
        super(GRU, self).__init__()
        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.seq_length = seq_length
        
        self.gru = nn.GRU(input_size=input_size,hidden_size=hidden_size,
                         num_layers=num_layers,batch_first=True)
        self.fc_1 = nn.Linear(hidden_size, 128)
        self.fc = nn.Linear(128, num_classes)
        self.relu = nn.ReLU()
        
    def forward(self, x) :
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
        output, (hn) = self.gru(x, (h_0))
        hn = hn.view(-1, self.hidden_size)
        out = self.relu(hn)
        out = self.fc_1(out)
        out = self.relu(out)
        out = self.fc(out)
        return out
num_epochs = 1000
learning_rate = 0.0001

input_size=5
hidden_size=2
num_layers=1

num_classes=1
model=GRU(num_classes,input_size,hidden_size,num_layers,X_train_tensors_final.shape[1]).to(device)

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs) :
    outputs = model.forward(X_train_tensors_final)
    optimizer.zero_grad()
    loss = criterion(outputs, y_train_tensors)
    loss.backward()
    
    optimizer.step()
    if epoch % 100 == 0 :
        print(f'Epoch : {epoch}, loss : {loss.item():1.5f}')

df_X_ss = ss.transform(df.drop(columns='Volume'))
df_y_mm = mm.transform(df.iloc[:, 5:6])

df_X_ss = Variable(torch.Tensor(df_X_ss)) #converting to Tensors
df_y_mm = Variable(torch.Tensor(df_y_mm))

#reshaping the dataset
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))
train_predict = model(df_X_ss.to(device)) #forward pass
data_predict = train_predict.data.detach().cpu().numpy() #numpy conversion
dataY_plot = df_y_mm.data.numpy()

# Undo the scaling of X according to feature_range
data_predict = mm.inverse_transform(data_predict) #reverse transformation
dataY_plot = mm.inverse_transform(dataY_plot)
plt.figure(figsize=(10,6)) #plotting
plt.axvline(x=4563, c='r', linestyle='--') #size of the training set

plt.plot(dataY_plot, label='Actuall Data') #actual plot
plt.plot(data_predict, label='Predicted Data') #predicted plot
plt.title('Time-Series Prediction')
plt.legend()
plt.show()

 

 

ํ ............

epoch๊ฐ€ ํด์ˆ˜๋ก loss ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ ธ์„œ

actual ๊ณผ predict ๊ทธ๋ž˜ํ”„๊ฐ€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋‹ค

 

์™œ ํด์ˆ˜๋ก ์™„์ „ ์œ ์‚ฌํ•˜๊ฒŒ ์ž˜ ์˜ˆ์ธก๋˜๋Š”๊ฑฐ์ง€.........???

์ผ๋‹จ ๋‚ด๊ฐ€ epoch ์ž‘๊ฒŒ ํ•ด์„œ ์ด์ƒํ•œ ๊ทธ๋ž˜ํ”„ ์˜ˆ์ธก๊ฐ’์ด ๋‚˜์™”์ง€๋งŒ.......

์ด๋ ‡๊ฒŒ ์™„์ „ ์œ ์‚ฌ...........

 

์™œ๊ทธ๋Ÿด๊นŒ์š”?

๊ถ๊ธ“ใ…ํ•ด์š”

๋‹ต์ข€...

 

 

 

 


  • NAVER ์ฃผ๊ฐ€ ^^
    • ์›๋ž˜ ์ด๋ ‡๊ฒŒ ์˜ˆ์ธก์„ ์ž˜ํ•œ๋‹ค๊ณ ์š”?
    • layer๋„ ์ž‘์€๋ฐ..;;?
start = (2000, 1, 1)  # 2020๋…„ 01๋…„ 01์›” 
start = datetime.datetime(*start)  
end = datetime.date.today()  # ํ˜„์žฌ 

# yahoo ์—์„œ NAVER ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
df = pdr.DataReader('035420.KS', 'yahoo', start, end)
df.Close.plot(grid=True)

์™ผ์ชฝ์€ LSTM, ์˜ค๋ฅธ์ชฝ์€ GRU

728x90
๋ฐ˜์‘ํ˜•
Comments