๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ 1 (Multivariate Time Series Data) ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ 1 (Multivariate Time Series Data)

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 28. 10:17
728x90
๋ฐ˜์‘ํ˜•

220928 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” today-1๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://today-1.tistory.com/38?category=886697 

 

๋‹ค๋ณ€๋Ÿ‰ ์„ ํ˜• ํ™•๋ฅ ๊ณผ์ •(VAR/Granger Causality/Cointegration)

๋‹ค๋ณ€๋Ÿ‰ ์„ ํ˜• ํ™•๋ฅ ๊ณผ์ •(VAR/Granger Causality/Cointegration) : ๋‹ค๋ณ€๋Ÿ‰ ์„ ํ˜• ํ™•๋ฅ ๊ณผ์ •์„ ๊ณต๋ถ€ํ•˜๊ณ ์ž ํ•จ. : ํ•ด๋‹น ๋ชจ๋ธ๋“ค์€ ๊ฒฐ๊ตญ AR ๋ชจํ˜•์„ ๋ฒˆ๊ฐˆ์•„ ์‚ฌ์šฉ, X์ธ์ž ์ถ”๊ฐ€, ์ ๋ถ„์„ ํ™œ์šฉํ•œ ๋‚ด์šฉ๋“ค๋กœ ๊ตฌ์„ฑ ๋จ.  1) ๋ฒกํ„ฐ

today-1.tistory.com

 

 

 

 

1๏ธโƒฃ ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ (Multivariate Time Series Data)

: ๊ฐ ์‹œ๊ฐ„ ๋‹จ์œ„๋งˆ๋‹ค ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ

: ๋‹ค์ค‘ ์‹œ๊ฐ„ ์ข…์† ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ

: ๋‹ค๋ณ€๋Ÿ‰ ๋ถ„์„์—์„œ ์˜ˆ์ธกํ•  ๋ณ€์ˆ˜์˜ ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ คํ•ด์•ผํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋“ค ์‚ฌ์ด์˜ ์˜์กด์„ฑ์„ ๊ณ ๋ ค

 

 

2๏ธโƒฃ ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ชจ๋ธ

๋‹ค๋ณ€๋Ÿ‰ ์„ ํ˜•&nbsp;ํ™•๋ฅ ๊ณผ์ •

 

๐Ÿ’• ๋ฒกํ„ฐ ์ž๋™ ํšŒ๊ท€ ๋ถ„์„ VAR(Vector Auto Regression)

: ์˜ˆ์ธกํ•  ๋ณ€์ˆ˜์˜ ๊ณผ๊ฑฐ ๊ฐ’๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜ˆ์ธกํ•  ๋ณ€์ˆ˜์™€ ์˜์กด์„ฑ์ด ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค๊นŒ์ง€ ๊ณ ๋ คํ•˜์—ฌ ์„ ํ˜• ํ•จ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ํ™•๋ฅ ์  ๊ณผ์ •

: ์ข…์† ๋ณ€์ˆ˜์™€ ๋…๋ฆฝ ๋ณ€์ˆ˜๋Š” ์ƒํ˜ธ ์˜ํ–ฅ์„ ๋ฐ›๋Š” ์กด์žฌ

: ๋‘ ๋ณ€์ˆ˜๋“ค ์ค‘ ์–ด๋–ค ๋ณ€์ˆ˜๊ฐ€ ์ข…์†๋ณ€์ˆ˜๋กœ ์ ํ•ฉํ•œ์ง€์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์šฉ

 

๐Ÿ’• ๊ทธ๋ž˜์ธ์ € ์ธ๊ณผ๊ด€๊ณ„ (Granger Causality)

: ์ •์ƒ์„ฑ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ (์ฐจ๋ถ„ ํ•„์š”)

: '๋‹ญ์ด ๋จผ์ €๋ƒ ๋‹ฌ๊ฑ€์ด ๋จผ์ €๋ƒ' ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์‚ฌ์šฉ

  • ์ถ”๋ก  ๋ถˆ๊ฐ€ํ•œ ๋ฌธ์ œ: "๋‹ญ์ด ๋จผ์ €์ธ๊ฐ€ ๋‹ฌ๊ฑ€์ด ๋จผ์ €์ธ๊ฐ€?" (์ธ๊ณผ๊ด€๊ณ„)
  • ์ถ”๋ก  ๊ฐ€๋Šฅํ•œ ๋ฌธ์ œ: "๋‹ญ๊ณผ ๋‹ฌ๊ฑ€์˜ ์ƒ์„ฑ์ˆœ์„œ ๋ณ„ ์„œ๋กœ์˜ ์˜ํ–ฅ๋ ฅ์€ ์–ด๋–ค๊ฐ€?" (Granger ์ธ๊ณผ๊ด€๊ณ„)

: ์›์ธ๊ณผ ์ธ๊ณผ ๊ด€๊ณ„๋ฅผ ๊ทœ๋ช…ํ•˜๋Š” ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋Œ€์ ์œผ๋กœ ๋‘ ์š”์ธ ์ค‘ ๋จผ์ € ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋ณ€์ˆ˜๋ฅผ ์•Œ์•„๋ณด๊ณ ์ž ํ•  ๋•Œ ์‚ฌ์šฉ

  • ๊ท€๋ฌด๊ฐ€์„ค(Null Hypothesis, ๐ป0H0): ํ•œ ๋ณ€์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š”๋‹ค
  • ๋Œ€๋ฆฝ๊ฐ€์„ค(Alternative Hypothesis, ๐ป1H1): ํ•œ ๋ณ€์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ๋„์›€์ด ๋œ๋‹ค

 

๐Ÿ’• ๊ณต์ ๋ถ„ (Cointegration)

: ๋น„์ •์ƒ์„ฑ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ

๊ณต์ ๋ถ„ ์ƒํƒœ = ๋‘ ๋น„์ •์ƒ์„ฑ ์‹œ๊ณ„์—ด์„ ์„ ํ˜•์กฐํ•ฉํ•˜์—ฌ ์ƒ์„ฑํ•œ ์‹œ๊ณ„์—ด์˜ ์ ๋ถ„ ์ฐจ์ˆ˜๊ฐ€ ๋‚ฎ์•„์ง€๊ฑฐ๋‚˜ ์ •์ƒ์ƒํƒœ๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ

: ๊ณต์ ๋ถ„ ์‹œ๊ณ„์—ด์€ ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋”๋ผ๋„ ์žฅ๊ธฐ์ ์œผ๋กœ ๊ฐ™์€ ๋ฐฉํ–ฅ์œผ๋กœ ์›€์ง์ด๋Š” ํŠน์„ฑ์„ ์ง€๋‹˜

: ํŽ˜์–ด ํŠธ๋ ˆ์ด๋”ฉ ์ „๋žต์— ํ™œ์šฉ

 

 

 

3๏ธโƒฃ ์ฝ”๋“œ ๊ตฌํ˜„

๐Ÿ”ด Library & data load

  • statsmodels : ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ํ†ต๊ณ„์  ๋ชจ๋ธ์„ ์ถ”์ •ํ•˜๋ฉฐ ํ†ต๊ณ„์  ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ฃผ๋Š” API
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

import statsmodels.api as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller
  • ์‹ค์งˆGDP ์ƒ˜ํ”Œ
data = sm.datasets.macrodata.load_pandas().data
data.head()

  • year, realgdp, realdpi ์„ธ๊ฐ€์ง€๋งŒ ์‚ฌ์šฉ
mydata = data[["realgdp", 'realdpi']]
mydata.index = data["year"]
mydata.head()

mydata.plot(figsize = (8,5))

์ƒ์Šน ์ถ”์„ธ

  • Stationary time series
    • ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์ƒ์„ฑ์„ ๊ฐ€์ง„๋‹ค๋Š” ์˜๋ฏธ๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์•ˆ์ •๋˜์–ด ์žˆ์–ด ๋ถ„์„ํ•˜๊ธฐ ์‰ฌ์›€
    • VAR์„ ์ ์šฉ์‹œ์ผœ๋ณด๊ธฐ ์ „์— ๋‘ ์‹œ๊ณ„์—ด ๋ณ€์ˆ˜๊ฐ€ ๋ชจ๋‘ stationay ์ƒํƒœ์ด์–ด์•ผ ํ•จ

 

๐Ÿ”ด AIC ๊ธฐ์ค€์„ ์‚ฌ์šฉ

: ๋ฐ์ดํ„ฐ์˜ stationarity๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ADF(Advanced Dickey-Fuller test)์™€ ๊ฐ™์€ ํ†ต๊ณ„์  ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰


from statsmodels.tsa.stattools import adfuller
  • p-value
  • test-statistics
  • critical value : ํ†ต๊ณ„ ๊ฒ€์ •์—์„œ ๊ฒ€์ • ๊ฐ’์˜ ํŒ๋‹จ ๊ธฐ์ค€์œผ๋กœ ์ž‘์šฉํ•˜๋Š” ๊ฐ’ (ADF Test ๊ฒ€์ • ๊ฐ’์ด ๋ณด๋‹ค ๋‚ฎ๊ฒŒ ๋˜๋ฉด p-value๊ฐ€ ๋‚ฎ์•„ ๊ท€๋ฌด ๊ฐ€์„ค ๊ธฐ๊ฐ ๊ฐ€๋Šฅ)
  • lag, observation
dftest = adfuller(x, maxlag, regression, autolag)
  • x : ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ
  • maxlag : ADF Test ์—์„œ p ์ง€์ • (์ง€์ •ํ•˜๋ ค๋ฉด autolag๋ฅผ None์œผ๋กœ ์„ค์ •)
  • regression
    • c : ์ถ”์„ธ๋Š” ์—†๊ณ  ์ƒ์ˆ˜ํ•ญ ์กด์žฌ
    • nc : ์ƒ์ˆ˜ํ•ญ๊ณผ ์ถ”์„ธ๊ฐ€ ์—†์Œ
    • ct : ์ถ”์„ธ์™€ ์ƒ์ˆ˜ํ•ญ ๋‘˜๋‹ค ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •
    • ctt : ์ƒ์ˆ˜ํ•ญ๊ณผ ์ผ์ฐจ, ์ด์ฐจ ์ถ”์„ธ๊ฐ€ ๋ชจ๋‘ ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •
  • autolag
    • ADF Test ์—์„œ p๋ฅผ ์ž๋™์œผ๋กœ ์ง€์ •
      • AIC, BIC : ๋‘˜์ค‘ ๊ฐ€์žฅ ๋‚ฎ๊ฒŒ ๋‚˜์˜ค๋Š” p๋ฅผ ์ž๋™ ์„ค์ • (์—ฌ๊ธฐ์„œ ์ง€์ •ํ•˜๋ฉด maxlag ๋ฌด์‹œ)
      • None : maxlag ์ง€์ •๊ฐ’ ์„ค์ •
      • t-stat : maxlag์—์„œ ์ง€์ •ํ•œ ๊ฐ’๋ถ€ํ„ฐ regression์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ํ†ต๊ณ„ ๊ฒ€์ • p-value๊ฐ€ 5% ๋ฏธ๋งŒ ๋  ๋•Œ์˜ ๋ž˜๊ทธ p๊ฐ’์œผ๋กœ ์„ค์ •

adfuller_test = adfuller(mydata['realgdp'], autolag= "AIC")

print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))

adfuller_test = adfuller(mydata['realdpi'], autolag= "AIC")

print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))

  • ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ p-value๊ฐ€ ์ถฉ๋ถ„ํžˆ ์œ ์˜๋ฏธํ•œ ๊ฐ’์„ ๊ฐ€์ง€์ง€ ์•Š์•„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ non-stationary

 

๐Ÿ”ด differencing (์ฐจ๋ถ„)

adfuller_test = adfuller(mydata['realgdp'], autolag= "AIC")

print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))

adfuller_test = adfuller(mydata_diff['realdpi'], autolag= "AIC")
print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))

  • realgdp, realdpi ๋ชจ๋‘ p-value ๊ฐ’์ด ์ž‘์•„์ง => sationary 

 

๐Ÿ”ด ๋ชจ๋ธ๋ง

  • ๋งˆ์ง€๋ง‰ 10์ผ์€ test ๋‚˜๋จธ์ง€๋Š” train
train = mydata_diff.iloc[:-10,:]
test = mydata_diff.iloc[-10:,:]

 

  • VAR๋ชจ๋ธ์˜ ์ตœ์  ์ˆœ์„œ
    • ์ตœ์ ์˜ ๋ชจ๋ธ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€ AIC(Akaike's Information Criterion)๋ฅผ ๋ชจ๋ธ ์„ ํƒ ๊ธฐ์ค€
    • ์ตœ์ƒ์˜ AIC์ ์ˆ˜๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ VAR์˜ ์ˆœ์„œ(p)๋ฅผ ์„ ํƒ
    • AIC๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ๋ณต์žกํ•˜๋‹ค๋Š” ์ด์œ ๋กœ ๋ถˆ์ด์ต์„ ์ฃผ๊ณค ํ•˜๋Š”๋ฐ ๋ณต์žกํ•œ ๋ชจ๋ธ์€ ์ผ๋ถ€ ๋‹ค๋ฅธ ๋ชจ๋ธ ์„ ํƒ ๊ธฐ์ค€์—์„œ ์•ฝ๊ฐ„ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์ค„ ์ˆ˜ ์žˆ์Œ
    • ์ˆœ์„œ(p) ๊ฒ€์ƒ‰ ์‹œ ๋ณ€๊ณก์ ์ด ์˜ˆ์ƒ๋˜๋Š”๋ฐ, ์ด๋Š” ์ผ์ • ์ˆœ์„œ๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ์ˆœ์„œ p๊ฐ€ ์ปค์ง€๋ฉด AIC์ ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๊ณ , ์ดํ›„ ์ ์ˆ˜๊ฐ€ ๋†’์•„์ง€๊ธฐ ์‹œ์ž‘ํ•œ๋‹ค

 

  • grid-search๋ฅผ ์ˆ˜ํ–‰ํ•ด์„œ ์ตœ์ ์˜ p
    • fit์œผ๋กœ VAR ๋ชจ๋ธ์„ ํ•™์Šต
    • 1๋ถ€ํ„ฐ 10๊นŒ์ง€ ์ ํ•ฉํ•œ ์ˆœ์„œ์— ๋Œ€ํ•œ AIC ์ ์ˆ˜๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๋ฐ˜๋ณต๋ฌธ์„ ํ†ตํ•ด grid-search
forecasting_model = VAR(train)
results_aic = []

for p in range(1,10):
  results = forecasting_model.fit(p)
  results_aic.append(results.aic)

  • ๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ AIC์ ์ˆ˜๋Š” 2์ด๊ณ , ๊ทธ ์ดํ›„ p๊ฐ€ ์ปค์ง์— ๋”ฐ๋ผ ์ฆ๊ฐ€ ์ถ”์„ธ
    • VAR๋ชจ๋ธ์˜ ์ตœ์  ์ˆœ์„œ๋Š” 2
sns.set()
plt.plot(list(np.arange(1,10,1)), results_aic)
plt.xlabel("Order")
plt.ylabel("AIC")
plt.show()

  • ๋ชจํ˜•์— ์ˆœ์„œ 2๋กœ fit ์‹œํ‚ค๊ณ  ์š”์•ฝ ๊ฒฐ๊ณผ
results = forecasting_model.fit(2)
results.summary()

 

 

๐Ÿ”ด ์˜ˆ์ธกํ•˜๊ธฐ

  • ํ•™์Šต๋œ ๋ชจ๋ธ์— 2์ผ ๋™์•ˆ์˜ ํ›ˆ๋ จ์„ ๋„ฃ์–ด ํ–ฅํ›„ 10์ผ ๋™์•ˆ์˜ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธก
laaged_values = train.values[-2:]
forecast = pd.DataFrame(results.forecast(y= laaged_values, steps=10), index = test.index, columns= ['realgdp_1d', 'realdpi_1d'])
forecast

 

  • ์–ธ๊ธ‰ํ•œ ์˜ˆ์ธก์ด ์ฐจ๋ถ„(diffencing)์— ๋Œ€ํ•œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ฒƒ
  • ์ฐจ๋ถ„์„ ๋”ํ•˜์—ฌ ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ธกํ•ด์•ผ ํ•  ๊ฐ’์œผ๋กœ ๋งŒ๋“ค๊ธฐ
    • ์™ผ์ชฝ(_1d)์€ ์ฐจ๋ถ„์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’
    • ์˜ค๋ฅธ์ชฝ (_forcasted)์€ ์›๋ž˜ ์‹œ๋ฆฌ์ฆˆ์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’
forecast["realgdp_forecasted"] = mydata["realgdp"].iloc[-10-1] + forecast['realgdp_1d'].cumsum()
forecast["realdpi_forecasted"] = mydata["realdpi"].iloc[-10-1] + forecast['realdpi_1d'].cumsum() 
forecast

 

  • ์‹ค์ œ test ์…‹๊ณผ ํ•ฉ์ณ์„œ ์‹œ๊ฐํ™”
    • realdpi์™€ realdpi_forecasted๋Š” ๋น„์Šทํ•œ ํŒจํ„ด
    • realgdp์™€ realgdp_forecasted๋Š” ์ ˆ๋ฐ˜ ์ •๋„๋Š” ๋น„์Šทํ•˜๋‹ค๊ฐ€ ๋‹ค๋ฅธ ํŒจํ„ด
test = mydata.iloc[-10:,:]
test["realgdp_forecasted"] = forecast["realgdp_forecasted"]
test["realdpi_forecasted"] = forecast["realdpi_forecasted"]
test.plot()

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments