๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

์ž๊ธฐ ์ƒ๊ด€(AutoCorrelation)์ด ๊ฐ•ํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ํ•™์Šตํ•˜๊ธฐ ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

์ž๊ธฐ ์ƒ๊ด€(AutoCorrelation)์ด ๊ฐ•ํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ํ•™์Šตํ•˜๊ธฐ

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 28. 14:49
728x90
๋ฐ˜์‘ํ˜•

220928 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” today-1๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://today-1.tistory.com/56?category=886697 

 

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ(Denoising Method)

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๊ณผ์ •์—์„œ ์‹œ๊ฐ„ ํ๋ฆ„์— ๋”ฐ๋ผ ๋ณ€๋™์ด ํฌ๊ฑฐ๋‚˜ ์ผ์ •ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ๋น„์ •์ƒ์„ฑ(Non-Stationarity)์„ ์ง€๋‹ˆ๊ฒŒ ๋˜๊ณ  ์ด๋ฅผ ์ „์ฒ˜๋ฆฌ ์—†์ด ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ํ•™์Šตํ•  ๊ฒฝ์šฐ ๋‹จ์ˆœ ํ›„ํ–‰

today-1.tistory.com

 

 

 

1๏ธโƒฃ ์ž๊ธฐ ์ƒ๊ด€(AutoCorrelation)

: ํ˜„์žฌ ๊ด€์ธก๊ฐ’๊ณผ ์ง€์—ฐ(Lag) ๊ฐ’๋“ค๊ณผ์˜ ๊ด€๊ณ„์—์„œ ๋ฐœ์ƒ

: ๊ด€๊ณ„์„ฑ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ACF/PACF ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง๊ด€์ ์œผ๋กœ ์•Œ์•„๋ณด๊ฑฐ๋‚˜ Durbin-Watson ๊ฒ€์ •์„ ํ†ตํ•ด ๊ฐ๊ด€์ ์œผ๋กœ ์‚ดํŽด๋ด„

 

 

2๏ธโƒฃ ์ฝ”๋“œ ๊ตฌํ˜„

 ๐Ÿ’– 1. library & data load

# ์˜ˆ์ œ๋กœ ์‚ฌ์šฉํ•  ์ฃผ์‹ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
!pip install finance-datareader
import pandas as pd
import FinanceDataReader as fdr

start_date = '20210101'
end_date = '20211231'
sample_code = '005930' # ์‚ผ์„ฑ์ „์ž
stock = fdr.DataReader(sample_code, start = start_date, end = end_date)

 

 

๐Ÿ’– ์žก์Œ ์ œ๊ฑฐ (Denoising)

  • (1) ๋‹จ์ˆœ ์ด๋™ํ‰๊ท  (Simple Moving Average)
    • ๊ตฌํ˜„ ๋ฐ ์ ์šฉ์ด ์šฉ์ดํ•˜๋‚˜ ์ ์ ˆํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •๊ณผ ๊ฐ’์ด ์šฐ์ธก์œผ๋กœ ์ง€์—ฐ๋˜๋Š” ํŠน์„ฑ
def SMA(df, col, window=2):
    return df[col].rolling(window=window, min_periods=1).mean()

stock['MA(5)'] = SMA(stock, 'Close', 5)
stock['MA(5)'].plot(grid=True)

(์ขŒ) ์žก์Œ์ œ๊ฑฐ                                                                                                                      (์šฐ) ์›๋ณธ๊ณผ ๋น„๊ต

 

 

  • (2) ์ง€์ˆ˜ ์ด๋™ํ‰๊ท  (Exponetial Moving Average)
    • ์ตœ๊ทผ๊ฐ’์— ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋ฉฐ ์ด๋™ํ‰๊ท ์„ ๊ณ„์‚ฐ
    • ํ‰ํ™œ ๊ณ„์ˆ˜(EP = 2/(๊ธฐ๊ฐ„+1) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง€์ˆ˜์ด๋™ํ‰๊ท ์„ ๊ณ„์‚ฐ
    • ์ง€์ˆ˜์ด๋™ํ‰๊ท  =  (์ข…๊ฐ€(t) x EP) + (์ง€์ˆ˜ ์ด๋™ํ‰๊ท (t-1) x (1-EP))
def EMA(df, col, span=2):
    return df[col].ewm(span=span).mean()

stock['EMA(5)'] = EMA(stock, 'Close', 5)
stock['EMA(5)'].plot(grid=True)

(์ขŒ) ์žก์Œ์ œ๊ฑฐ                                                                                                                      (์šฐ) ์›๋ณธ๊ณผ ๋น„๊ต

 

 

  • (3) ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜(Fourier Transform)
    • ์–ด๋–ค ๋ณต์žกํ•œ ํŒŒ๋™์ด๋ผ๋„ ์ง„๋™์ˆ˜์™€ ์ง„ํญ์ด ๋‹ค๋ฅธ ๊ฐ„๋‹จํ•œ ํŒŒ๋™๋“ค์˜ ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค
    • ์‹œ๊ฐ„ ์ฐจ์›์—์„œ ๋ฐœ์ƒํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผํŒŒ์ˆ˜ ์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํŠน์ • ์ƒ์œ„ ํŒŒ๋™๋“ค์˜ ํ•ฉ์„ ๊ณ„์‚ฐํ•˜๊ณ  ๋‹ค์‹œ ์‹œ๊ฐ„ ์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ => ์žก์Œ์„ ์ œ๊ฑฐํ•˜์—ฌ ์˜ฌ๋ฐ”๋ฅธ ์‹ ํ˜ธ๋ฅผ ํฌ์ฐฉ
def FFT(df, col, topn=2):
    fft = np.fft.fft(df[col])
    fft[topn:-topn] = 0
    ifft = np.fft.ifft(fft)
    return ifft

stock['FFT(30)'] = FFT(stock, 'Close', 30)
stock['FFT(30)'] .plot(grid=True)
  • ๋ณต์žกํ•œ ํŒŒ๋™์„ ๋‹จ์ˆœํ•œ ํŒŒ๋™๋“ค์˜ ํ•ฉ์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ  ๋‹ค์‹œ ์‹œ๊ฐ„์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜์‹œํ‚ค๋Š” ๊ณผ์ •์—์„œ ์‹œ๊ฐ„์ถ•์˜ ์ •๋ณด๊ฐ€ ์†์‹ค
  • ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ ์ž ํŠน์ • ์‹œ๊ฐ„ ๊ตฌ๊ฐ„๋ณ„๋กœ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์„ ์ง„ํ–‰ํ•˜๋Š” STFT(Short Time Fourier Transform)์ด ์†Œ๊ฐœ
    • but, ํŠน์ • ์‹œ๊ฐ„๋Œ€(=window size)๋ฅผ ์„ค์ •
      • ๊ธธ๊ฒŒ ์žก์œผ๋ฉด ์ฃผํŒŒ์ˆ˜ ํ•ด์ƒ๋„๋Š” ์ƒ์Šนํ•˜๊ณ  ์‹œ๊ฐ„์— ๋Œ€ํ•œ ํ•ด์ƒ๋„๋Š” ํ•˜๋ฝ
      • ์งง๊ฒŒ ์žก์œผ๋ฉด ์ฃผํŒŒ์ˆ˜ ํ•ด์ƒ๋„๊ฐ€ ๊ฐ์†Œํ•˜๊ณ  ์‹œ๊ฐ„์— ๋Œ€ํ•œ ํ•ด์ƒ๋„๋Š” ์ƒ์Šน
      • ์‹œ๊ฐ„๊ณผ ์ฃผํŒŒ์ˆ˜ ํ•ด์ƒ๋„์— ๋Œ€ํ•ด Trade-off ๊ด€๊ณ„

(์ขŒ) ์žก์Œ์ œ๊ฑฐ                                                                                                                      (์šฐ) ์›๋ณธ๊ณผ ๋น„๊ต

 

 

  • (4) ์›จ์ด๋ธ”๋ฆฟ ๋ณ€ํ™˜(Wavelet Transform)
    • ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์˜ ๋‹จ์ ์„ ํ•ด์†Œํ•˜๊ณ ์ž ๊ฐœ๋ฐœ
    • ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„ ์‹ ํ˜ธ์— ๋Œ€ํ•ด์„œ๋Š” ์ฃผํŒŒ์ˆ˜ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๊ณ  ์‹œ๊ฐ„ ํ•ด์ƒ๋„๋ฅผ ๋‚ฎ์ถค
    • ์ €์ฃผํŒŒ ์„ฑ๋ถ„ ์‹ ํ˜ธ์— ๋Œ€ํ•ด์„œ๋Š” ์ฃผํŒŒ์ˆ˜ ํ•ด์ƒ๋„๋ฅผ ๋‚ฎ์ถ”๊ณ  ์‹œ๊ฐ„ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๋Š” ์›จ์ด๋ธ”๋ฆฟ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉ
      • ์›จ์ด๋ธ”๋ฆฟ ๋ณ€ํ™˜์€ ์‹œ๊ฐ„์˜ ํ™•์žฅ๊ณผ ์ถ•์†Œํ•˜๋Š” Scaling๊ณผ ์‹œ๊ฐ„ ์ถ•์œผ๋กœ ์ด๋™๋˜๋Š” Shifting์ด ํ•ต์‹ฌ
      • ๋™์ผํ•œ ์ž๋ฃŒ๋ฅผ ๋ถ„์„ํ•˜๋”๋ผ๋„ ๋ชจ ์›จ์ด๋ธ”๋ฆฟ์˜ ์„ ํƒ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ ํŠน์„ฑ์— ๋งž๋Š” ๋ชจ ์›จ์ด๋ธ”๋ฆฟ์„ ์ž˜ ์„ ํƒ
      • ์ด์‚ฐ ์›จ์ด๋ธ”๋ฆฟ ๋ณ€ํ™˜์„ ์œ„ํ•ด Haar, Daubechies ๋“ฑ์„ ๋งŽ์ด ์‚ฌ์šฉ
!pip install PyWavelets
def WT(df, col, wavelet='db5', thresh=0.63):
    signal = df[col].values
    thresh = thresh*np.nanmax(signal)
    coeff = pywt.wavedec(signal, wavelet, mode="per" )
    coeff[1:] = (pywt.threshold(i, value=thresh, mode="soft" ) for i in coeff[1:])
    reconstructed_signal = pywt.waverec(coeff, wavelet, mode="per" )
    return reconstructed_signal

stock['db5'] = WT(stock, 'Close')
stock['db5'] .plot(grid=True)

(์ขŒ) ์žก์Œ์ œ๊ฑฐ                                                                                                                      (์šฐ) ์›๋ณธ๊ณผ ๋น„๊ต

 

 

 

  • (5) AutoEncoder
    • ์ข…๋‹จ ๊ฐ„(end-to-end) ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ ์žก์Œ์„ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ์‘์šฉ๋ชจ๋ธ(AutoEncoder,.. etc)์„ ์˜ˆ์ธก ๋ชจ๋ธ๊ณผ ์—ฐ๊ฒฐ์‹œ์ผœ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด ํ•™์Šต ๋น„์šฉ์„ ๊ฐ์†Œ์‹œํ‚ค๋ฉด์„œ ํšจ๊ณผ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data_utils 

class TimeDistributed(nn.Module):
    def __init__(self, module):
        super(TimeDistributed, self).__init__()
        self.module = module

    def forward(self, x):
        if len(x.size()) <= 2:
            return self.module(x)
        x_reshape = x.contiguous().view(-1, x.size(-1)) 
        y = self.module(x_reshape)
        if len(x.size()) == 3:
            y = y.contiguous().view(x.size(0), -1, y.size(-1))
        return y

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.LSTM(
                               input_size = 1, 
                               hidden_size = 16, 
                               dropout = 0.25,
                               num_layers = 2,
                               bias = True,
                               batch_first = True,
                               bidirectional = True,
                               )
        self.decoder = nn.LSTM(
                               input_size = 32, 
                               hidden_size = 16, 
                               dropout = 0.25,
                               num_layers = 2,
                               bias = True,
                               batch_first = True,
                               bidirectional = True,
                               )
        self.fc = TimeDistributed(nn.Linear(32, 1))
                
    def forward(self, x):
        h0, (h_n, c_n) = self.encoder(x)
        h0, (h_n, c_n) = self.decoder(h0[:,-1:,:].repeat(1,5,1))
        out = self.fc(h0)
        return out

 

๐Ÿ’– ์‹œ๊ฐํ™”

  • ๋‹จ์ˆœ ์ด๋™ํ‰๊ท (MA)๊ณผ ์ง€์ˆ˜ ์ด๋™ํ‰๊ท (EMA)์€ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— ๋น„ํ•ด ์Šค๋ฌด๋”ฉ ๋œ ํ‘œํ˜„์„ ์–ป์Œ
    • ์—ฌ์ „ํžˆ ํฐ ๋ถ„์‚ฐ์„ ๋ณด์ด๊ณ  ์žˆ์Œ
  • ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜(FFT)์„ ์‚ดํŽด๋ณด๋ฉด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•ด ๋” ๋ถ€๋“œ๋Ÿฌ์šด ํ‘œํ˜„์„ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๊ณ  ๋งŽ์€ ์žก์Œ์ด ์ œ๊ฑฐ
  • ์›จ์ด๋ธ”๋ฆฟ ๋ณ€ํ™˜(WT)์€ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜๋ณด๋‹ค ๋” ๋ถ€๋“œ๋Ÿฌ์šด ํŒŒ๋™์˜ ํ˜•ํƒœ๋ฅผ ์ง€๋‹ˆ๊ฒŒ ๋˜์–ด ๊ธฐ์กด ์žก์Œ๊ณผ ๋”๋ถˆ์–ด ์ •๋ณด๊นŒ์ง€ ๊ฐ™์ด ์†์‹ค
  • ์˜คํ†  ์ธ์ฝ”๋”(AE)์˜ ๊ฒฝ์šฐ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜ ์ค‘๊ฐ„์ฏค์˜ ํ‘œํ˜„๋ ฅ์„ ์ง€๋‹˜

  • 1. ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜(FFT)์ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ
  • 2. ์˜คํ†  ์ธ์ฝ”๋”(AE) ๋ชจ๋ธ์ด ์žก์Œ ์ œ๊ฑฐ๋กœ ์ธํ•ด ์ข‹์€ ์„ฑ๋Šฅ
  • 3. ์ด๋™ํ‰๊ท (MA, EMA) ๋ฐฉ๋ฒ•๋“ค์€ ๊ณผ๊ฑฐ์˜ ๊ฒฐ๊ณผ ์ž๋ฃŒ๋กœ์จ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๊ธฐ ๋•Œ๋ฌธ์— ํ›„ํ–‰ ์˜ˆ์ธก์ด ์—ฌ์ „ํžˆ ์กด์žฌ (๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต์ด ์ œ๋Œ€๋กœ ์ง„ํ–‰๋˜์ง€ ์•Š์Œ)
  • 4. ์›จ์ด๋ธ”๋ฆฟ ๋ณ€ํ™˜(WT)์˜ ๊ฒฝ์šฐ ํ›„ํ–‰ ์˜ˆ์ธก ๋ฌธ์ œ๋ฅผ ํ•ด์†Œํ•  ์ˆ˜ ์žˆ์œผ๋‚˜ ๋ณ€ํ™˜ ์ž‘์—…์—์„œ ์‹ ํ˜ธ์˜ ์žก์Œ๊ณผ ์ •๋ณด๋ฅผ ๊ฐ™์ด ์†์‹ค๋˜์–ด ์ •ํ™•๋„๋ฅผ ์žƒ๊ฒŒ๋จ

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments