😎 κ³΅λΆ€ν•˜λŠ” μ§•μ§•μ•ŒνŒŒμΉ΄λŠ” μ²˜μŒμ΄μ§€?

λ‹€λ³€λŸ‰ μ‹œκ³„μ—΄ 데이터 2 (Multivariate Time Series Data) λ³Έλ¬Έ

πŸ‘©‍πŸ’» 인곡지λŠ₯ (ML & DL)/Serial Data

λ‹€λ³€λŸ‰ μ‹œκ³„μ—΄ 데이터 2 (Multivariate Time Series Data)

μ§•μ§•μ•ŒνŒŒμΉ΄ 2022. 9. 30. 09:54
728x90
λ°˜μ‘ν˜•

220930 μž‘μ„±

<λ³Έ λΈ”λ‘œκ·ΈλŠ” ysyblog λ‹˜μ˜ λΈ”λ‘œκ·Έλ₯Ό μ°Έκ³ ν•΄μ„œ κ³΅λΆ€ν•˜λ©° μž‘μ„±ν•˜μ˜€μŠ΅λ‹ˆλ‹€>

https://ysyblog.tistory.com/298

 

[μ‹œκ³„μ—΄λΆ„μ„] λ‹€λ³€λŸ‰ μ„ ν˜• ν™•λ₯ κ³Όμ • - VAR & IRP (λ°±ν„°μžκΈ°νšŒκ·€κ³Όμ •, μž„νŽ„μŠ€μ‘λ‹΅ν•¨μˆ˜)

λ‹€λ³€λŸ‰ μ„ ν˜• ν™•λ₯ κ³Όμ • ν•„μš”μ„± λ‹¨λ³€λŸ‰ μ‹œκ³„μ—΄(Simple/Multiple포함)은 μ’…μ†λ³€μˆ˜(Y_t)κ°€ λ…λ¦½λ³€μˆ˜λ“€μ—λ§Œ! 영ν–₯을 λ°›λŠ”λ‹€λŠ” 큰 κ°€μ • 쑴재 ν˜„μ‹€μ μœΌλ‘  μ’…μ†λ³€μˆ˜μ™€ λ…λ¦½λ³€μˆ˜λŠ” μƒν˜Έ 영ν–₯을 μ£Όκ³ λ°›μŒ μ˜ˆμ‹œ:

ysyblog.tistory.com

 

 

1️⃣ λ‹€λ³€λŸ‰ μ‹œκ³„μ—΄

  • μ’…μ†λ³€μˆ˜(Y_t)κ°€ λ…λ¦½λ³€μˆ˜λ“€μ—λ§Œ 영ν–₯ λ°›μŒ
  • 2차원(μ†Œλ“, μ§€μΆœ : μ’…μ†λ³€μˆ˜) κ³Όκ±° 1μ‹œμ κ°€μ§€λ§Œμ„ κ³ λ €ν•˜λŠ” λ²‘ν„°μžκΈ°νšŒκ·€ μ•Œκ³ λ¦¬μ¦˜

πŸ’— λ²‘ν„°μžκΈ°νšŒκ·€ λͺ¨ν˜•

1) VAR μ•Œκ³ λ¦¬μ¦˜

  • 평균 벑터와 곡뢄산 벑터가 μ‹œμ°¨μ—λ§Œ μ˜μ‘΄ν•˜κ³  각각의 μ ˆλŒ€μœ„μΉ˜μ— 독립적인 정상성 μ‹œκ³„μ—΄

2) μž„νŽ„μŠ€ 응닡 ν•¨μˆ˜

  • μ—¬λŸ¬κ°œμ˜ μ‹œκ³„μ—΄ μƒν˜Έ 상관관계λ₯Ό 기반으둜 각각의 λ³€μˆ˜κ°€ λ‹€λ₯Έ λ³€μˆ˜μ— μ–΄λ–€ 영ν–₯을 μ£ΌλŠ”μ§€ μž„νŽ„μŠ€ λ°˜μ‘ ν•¨μˆ˜λ‘œ μ•Œ 수 있음
    • μž„νŽ„μŠ€ : μ–΄λ–€ μ‹œκ³„μ—΄μ΄ t = 0 일 λ•Œ 1이고, t < 0 or t > 0 일 λ•Œ 0
    • μž„νŽ„μŠ€ ν˜•νƒœμ˜ μ‹œκ³„μ—΄μ΄ λ‹€λ₯Έ μ‹œκ³„μ—΄μ— λ―ΈμΉ˜λŠ” 영ν–₯을 μ‹œκ°„μ— 따라 ν‘œμ‹œ

 

2️⃣ μ½”λ“œ κ΅¬ν˜„

πŸ’— 데이터 λ‘œλ”© 및 μ‹œκ°ν™”

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels
import statsmodels.api as sm

raw = sm.datasets.macrodata.load_pandas().data
dates_info = raw[['year', 'quarter']].astype(int).astype(str)
raw.index = pd.DatetimeIndex(sm.tsa.datetools.dates_from_str(dates_info['year'] + 'Q'+ dates_info['quarter']))
raw_use = raw.iloc[:, 2:5]
  • μ‹€μ œ GDP = μ‹€μ œ CON(μ†ŒλΉ„) + μ‹€μ œ INV(투자)
raw_use.plot(subplots = True, figsize = (12, 5))
plt.tight_layout()
plt.show()

raw_use.diff(1).dropna().plot(subplots = True, figsize = (12, 5))
plt.tight_layout()
plt.show()

μ™Όμͺ½ 원본                                                                                                     μ˜€λ₯Έμͺ½ diff(1) μ°¨λΆ„

 

πŸ’— VAR λͺ¨ν˜•

  • realgdp : realgdp, realcon, realinv 의 μ‹œμ°¨ L1 λͺ¨λ‘μ™€ realcon의 μ‹œμ°¨ L2에 영ν–₯
  • realcon : realgdp, realcon, realinv 에 μ‹œμ°¨ L1, L2 λͺ¨λ‘ 영ν–₯
  • realinv : realgdp, realcon, realinv 에 μ‹œμ°¨ L1 만 영ν–₯
  • realcon, realgdp, realinv 순으둜 κ°€μž₯ λ‹€λ₯Έ λ³€μˆ˜μ— 영ν–₯ 많이 λ°›μŒ
raw_use_return = raw_use.diff(1).dropna()
fit = sm.tsa.VAR(raw_use_return).fit(maxlags = 2)
display(fit.summary())

 

πŸ’— λͺ¨ν˜• 예츑 및 μ‹œκ°ν™”

forecast_num = 20

# 점좔정
# pre_var = fit.forecast(fit.model.endog[-1:], steps = forecast_num)

# ꡬ간좔정
# pre_var_ci = fit.forecast_interval(fit.model.endog[-1:], steps = forecast_num)

fit.plot_forecast(forecast_num)
plt.tight_layout()
plt.show()

 

  • μž„νŽ„μŠ€ λ°˜μ‘ ν•¨μˆ˜
    • realgdp -> realgdp : λ‹¨κΈ°μ μœΌλ‘œ 음수둜 λ°”λ€Œμ§€λ§Œ 0으둜 수렴
    • realgdp -> realcons : gdpκ°€ 증가, μ†ŒλΉ„κ°€ κ°μ†Œν•˜λ‹€κ°€ λŠ˜μ–΄λ‚¨
    • realinv -> realcons : μ‹€μ œ νˆ¬μžκ°€ μ¦κ°€ν•˜λ©΄μ„œ μ‹€μ œ μ†ŒλΉ„ μ¦κ°€ν•˜λ‹€κ°€ 0으둜 수렴
    • realcon -> realgdp : μ†ŒλΉ„κ°€ μ¦κ°€ν•˜λ©΄ μ‹€μ œ gdpκ°€ μƒλ‹Ήνžˆ μ¦κ°€ν•˜λ‹€κ°€ 0으둜 수렴
    • realinv -> realgdp : νˆ¬μžκ°€ μ¦κ°€ν•˜λ©΄ μ‹€μ œ gdpκ°€ μ†Œν­ μ¦κ°€ν•˜λ‹€κ°€ 0으둜 수렴
 
fit.irf(forecast_num).plot()
plt.tight_layout()
plt.show()

 

πŸ’— μž”μ°¨μ§„λ‹¨

fit.plot_acorr()
plt.tight_layout()
plt.show()

 

 

 

728x90
λ°˜μ‘ν˜•
Comments