๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

fbprophet ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ์›น ํŠธ๋ž˜ํ”ฝ ๊ฐ์ง€ ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

fbprophet ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ์›น ํŠธ๋ž˜ํ”ฝ ๊ฐ์ง€

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 23. 16:49
728x90
๋ฐ˜์‘ํ˜•

220923 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” HyeongWookKim ๋‹˜์˜ ๊นƒํ—ˆ๋ธŒ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://gist.github.com/HyeongWookKim/c8f31f30b233896bb8947622d7efaf82

 

[Ch 7. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ค„๋ณด์ž] from "ํŒŒ์ด์ฌ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฃผ๋ฌด๋ฅด๊ธฐ(๋ฏผํ˜•๊ธฐ ์ง€์Œ)"

[Ch 7. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ค„๋ณด์ž] from "ํŒŒ์ด์ฌ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฃผ๋ฌด๋ฅด๊ธฐ(๋ฏผํ˜•๊ธฐ ์ง€์Œ)" - Ch 7. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ค„๋ณด์ž.ipynb

gist.github.com

 

 

 

1๏ธโƒฃ libraries & data load

import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import pandas_datareader.data as web
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from fbprophet import Prophet
from datetime import datetime
  • ๋‚ ์งœ(index), ๋ฐฉ๋ฌธ์ˆ˜(hit)
pinkwink_web = pd.read_csv("08. PinkWink Web Traffic.csv",
                           encoding = 'utf-8', thousands = ',',
                           names = ['date', 'hit'], index_col = 0)
pinkwink_web = pinkwink_web[pinkwink_web['hit'].notnull()]
pinkwink_web.head()

 

  • 2016๋…„ 7์›” 1์ผ๋ถ€ํ„ฐ 2017๋…„ 6์›” 16์ผ๊นŒ์ง€์˜ ์œ ์ž…๋Ÿ‰
pinkwink_web['hit'].plot(figsize = (12, 4), grid = True);

 

๐Ÿ–ค  '์„ ํ˜• ํšŒ๊ท€ ์ง์„ ' ๋ฐ '๋‹คํ•ญ ํšŒ๊ท€์‹'์„ ํ‘œํ˜„

time = np.arange(0, len(pinkwink_web)) # ์‹œ๊ฐ„์ถ•(time) ์ƒ์„ฑ (0๋ถ€ํ„ฐ 365๊นŒ์ง€)
traffic = pinkwink_web['hit'].values   # ์›น ํŠธ๋ž˜ํ”ฝ์˜ ์ž๋ฃŒ๋ฅผ traffic ๋ณ€์ˆ˜์— ์ €์žฅ

fx = np.linspace(0, time[-1], 1000)    # 0๋ถ€ํ„ฐ 364๊นŒ์ง€๋ฅผ 1000๊ฐœ๋กœ ๋ถ„ํ• 
  • RMSE(Root Mean Square Error)์„ ๊ณ„์‚ฐํ•ด์ฃผ๋Š” error() ํ•จ์ˆ˜๋ฅผ ์ƒ์„ฑ
def error(f, x, y):
    return np.sqrt(np.mean((f(x) - y) ** 2)) # RMSE(Root Mean Square Error)

 

  • ๊ฐ ์ฐจ์ˆ˜๋ณ„ ๋‹คํ•ญ์‹์˜ 'RMSE(Root Mean Square Error)'์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”
    • 1์ฐจ, 2์ฐจ, 3์ฐจ ๋‹คํ•ญ์‹์˜ RMSE(Root Mean Square Error)๋Š” ๊ฑฐ์˜ ๋น„์Šท
    • 15์ฐจ ๋‹คํ•ญ์‹์˜ RMSE(Root Mean Square Error)๋Š” ๋น„๊ต์  ๋‚ฎ์Œ
      • => ๋ชจ๋ธ์ด ๋ณต์žกํ•ด์งˆ์ˆ˜๋ก ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ fitting ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง€๊ธฐ ๋•Œ๋ฌธ
      • => ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ๋ณต์žกํ•˜๋ฉด ๊ณผ์ ํ•ฉ(overfitting) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ
# 1์ฐจ ๋‹คํ•ญ์‹
fp1 = np.polyfit(time, traffic, 1)
f1 = np.poly1d(fp1)

# 2์ฐจ ๋‹คํ•ญ์‹
f2p = np.polyfit(time, traffic, 2)
f2 = np.poly1d(f2p)

# 3์ฐจ ๋‹คํ•ญ์‹
f3p = np.polyfit(time, traffic, 3)
f3 = np.poly1d(f3p)

# 15์ฐจ ๋‹คํ•ญ์‹
f15p = np.polyfit(time, traffic, 15)
f15 = np.poly1d(f15p)

# ๊ฐ ์ฐจ์ˆ˜๋ณ„ ๋‹คํ•ญ์‹์˜ '์ž”์ฐจ ์ œ๊ณฑํ•ฉ'์„ ๊ณ„์‚ฐ
print(error(f1, time, traffic))
print(error(f2, time, traffic))
print(error(f3, time, traffic))
print(error(f15, time, traffic))
plt.figure(figsize = (10, 6))
plt.scatter(time, traffic, s = 10)

plt.plot(fx, f1(fx), lw = 4, label = 'f1')
plt.plot(fx, f2(fx), lw = 4, label = 'f2')
plt.plot(fx, f3(fx), lw = 4, label = 'f3')
plt.plot(fx, f15(fx), lw = 4, label = 'f15')

plt.grid(True, linestyle = '-', color = '0.75')

plt.legend(loc = 2)
plt.show()

1, 2, 3, 15 ์ฐจ ๋‹คํ•ญ์‹

 

 

2๏ธโƒฃ Prophet ๋ชจ๋“ˆ์„ ์ด์šฉํ•œ forecast ์˜ˆ์ธก

  • pinkwink_web ๋ณ€์ˆ˜์—์„œ ๋‚ ์งœ(index), ๋ฐฉ๋ฌธ์ˆ˜(hit)๋งŒ ๋”ฐ๋กœ ์ €์žฅ
  • pandas์˜ to_datetime() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ 'ds' ๋ณ€์ˆ˜๋ฅผ ๋‚ ์งœ๋กœ ์„ ์–ธ
  • Prophet() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, ์ฃผ๊ธฐ์„ฑ์ด ์—ฐ๋‹จ์œ„(yearly_seasonality) ๋ฐ ์ผ๋‹จ์œ„(daily_seasonality)
df = pd.DataFrame({'ds': pinkwink_web.index, 'y': pinkwink_web['hit']})
df.reset_index(inplace = True)

df['ds'] = pd.to_datetime(df['ds'], format = "%y. %m. %d.") # "yyyy-mm-dd" ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝ

# 'ds' ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ•ด์คฌ์œผ๋ฏ€๋กœ, 'date' ๋ณ€์ˆ˜๋Š” ์‚ญ์ œ
del df['date']

m = Prophet(yearly_seasonality = True, daily_seasonality = True)
m.fit(df)

 

  • make_future_dataframe()๋กœ ์ง€์ •๋œ ๋‚ ์งœ ์ˆ˜๋งŒํผ ๋ฏธ๋ž˜๋กœ ํ™•์žฅํ•˜๋Š” ์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์–ป๊ธฐ
    • 60์ผ ๊ฐ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธก
future = m.make_future_dataframe(periods = 60)
future.tail()

 

  • predict()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธกํ•œ ๋ฐ์ดํ„ฐ๋ฅผ forecast ๋ณ€์ˆ˜์— ์ €์žฅ
forecast = m.predict(future)
forecast.head()

 

  • ํ™•์ธํ•ด๋ณด๊ณ ์ž ํ•˜๋Š” ๋ณ€์ˆ˜๋“ค๋งŒ ๋ฝ‘์•„๋‚ด์„œ ํ™•์ธ
    • 'ds', 'yhat', 'yhat_lower', 'yhat_upper' 

 

  • ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธ
    • 2017๋…„ 6์›” ๋ง๊นŒ์ง€์˜ ๋ฐ์ดํ„ฐ ์ดํ›„, ์•ฝ 2๊ฐœ์›”(60์ผ)์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ
    • ๋‹จ์ˆœํžˆ ๋‹คํ•ญ์‹์œผ๋กœ ๊ฒฝํ–ฅ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ํšจ๊ณผ์ 
m.plot(forecast);

 

  • plot_components()๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์„ ํ˜• ํšŒ๊ท€ ๋ฐ ๊ณ„์ ˆ์„ฑ ์„ฑ๋ถ„ ๋ณ„๋กœ ๋ถ„ํ•ด
    • = forecast component ์‹œ๊ฐํ™” (Trend, Holidays, Weakly, Yearly, Daily)
m.plot_components(forecast);

์ถ”์„ธ์„ฑ (Trend) ๋ฐ์ดํ„ฐ์˜ ์žฅ๊ธฐ์  ๋ณ€๋™
์ˆœํ™˜์„ฑ (Cycle) ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€์ง€๋งŒ ์ผ์ •ํ•˜์ง€ ์•Š๊ณ  ๋ฐ˜๋ณต์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ณ€๋™
๊ณ„์ ˆ์„ฑ (Seasonality) ์ผ์ •ํ•œ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ฐ˜๋ณต๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ณ€๋™
๋ถˆ๊ทœ์น™์„ฑ (Noise) ์•Œ ์ˆ˜ ์—†๊ฑฐ๋‚˜ ๋Œ๋ฐœ์ ์ธ ์š”์ธ์— ์˜ํ•ด ๋ฐœ์ƒํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ณ€๋™

 

 

plot_components ๋กœ ์ƒํ™ฉ๋ณ„ ๊ทธ๋ž˜ํ”„ ๋ถ„์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋‹ˆ..

์‹ ๊ธฐํ•˜๋‹ค.... ํฌ..

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments