๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

[Kaggle] Smart Home Dataset with weather Information ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

[Kaggle] Smart Home Dataset with weather Information

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 16. 16:21
728x90
๋ฐ˜์‘ํ˜•

220916 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š”kaggle์˜ koheimuramatus ๋‹˜์˜ code์™€ notebook ์„ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://www.kaggle.com/code/koheimuramatsu/change-detection-forecasting-in-smart-home/notebook

 

Change Detection & Forecasting in Smart Home

Explore and run machine learning code with Kaggle Notebooks | Using data from Smart Home Dataset with weather Information

www.kaggle.com

 

 

 

๐Ÿ˜Ž energy data from house appliances and weather information

  • ๊ฐ€์ „์ œํ’ˆ๋ณ„ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰๊ณผ ๊ธฐ๊ฐ„ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ดํ•ด
  • ๊ฐ€์ „์ œํ’ˆ์˜ ์ด์ƒ ์‚ฌ์šฉ์„ ๊ฐ์ง€
  • ๋‚ ์”จ ์ •๋ณด์™€ ํƒœ์–‘๊ด‘ ๋ฐœ์ „ ์—๋„ˆ์ง€ ๊ฐ„์˜ ๊ด€๊ณ„

 

 

 

๐Ÿ˜Ž ์ฝ”๋“œ ๊ตฌํ˜„

1๏ธโƒฃ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋กœ๋“œ

  • changefinder : ์˜จ๋ผ์ธ ๋ณ€๊ฒฝ์  ๊ฐ์ง€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
  • HoloViews : ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์‹œ๊ฐํ™”๋ฅผ ์›ํ™œํ•˜๊ณ  ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๋„๋ก ์„ค๊ณ„
  • shap : ๋ชจ๋“  ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒŒ์ž„ ์ด๋ก ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹
!pip install changefinder
!conda install -c pyviz holoviews bokeh -y
!pip install lightgbm
!conda install -c conda-forge shap -y
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
from matplotlib import pyplot as plt
import seaborn as sns
import os
import changefinder
from scipy import stats
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import grangercausalitytests
from statsmodels.tsa.stattools import adfuller
from fbprophet import Prophet
from sklearn.metrics import mean_absolute_error
import shap
shap.initjs()
import lightgbm as lgb
from sklearn.preprocessing import LabelEncoder
from tabulate import tabulate
from IPython.display import HTML, display

 

 

2๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋กœ๋“œ

df = pd.read_csv("HomeC.csv/HomeC.csv",low_memory=False)

print(f'HomeC.csv : {df.shape}')
df.head(3)

  • Weather information 
    • temperature
      • ๋”์œ„์™€ ์ถ”์œ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฌผ๋ฆฌ๋Ÿ‰
    • humidity
      • ๊ณต๊ธฐ ์ค‘์— ์กด์žฌํ•˜๋Š” ์ˆ˜์ฆ๊ธฐ์˜ ๋†๋„
    • visibility
      • ๊ด‘์„ ์ด ์ด๋™ํ•˜๋Š” ๋Œ€๊ธฐ์˜ ๊ธธ์ด๋กœ ์ •์˜๋˜๋Š” ๊ธฐ์ƒ ๊ด‘ํ•™ ๋ฒ”์œ„
    • apparentTemperature
      • ๊ธฐ์˜จ, ์ƒ๋Œ€์Šต๋„ ๋ฐ ํ’์†์˜ ๋ณตํ•ฉ์ ์ธ ์˜ํ–ฅ์œผ๋กœ ์ธํ•ด ์ธ๊ฐ„์ด ์ง€๊ฐํ•˜๋Š” ์˜จ๋„ ๋“ฑ๊ฐ€
    • pressure
      • ๊ธฐ์••์˜ ํ•˜๋ฝ์€ ๋‚˜์œ ๋‚ ์”จ๊ฐ€ ์˜ค๊ณ  ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ด๊ณ , ๊ธฐ์••์˜ ์ƒ์Šน์€ ์ข‹์€ ๋‚ ์”จ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
    • windSpeed
      • ์ผ๋ฐ˜์ ์œผ๋กœ ์˜จ๋„ ๋ณ€ํ™”๋กœ ์ธํ•ด ๊ณต๊ธฐ๊ฐ€ ๊ณ ์••์—์„œ ์ €์••์œผ๋กœ ์ด๋™ํ•จ์— ๋”ฐ๋ผ ๋ฐœ์ƒํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ ๋Œ€๊ธฐ๋Ÿ‰
    • cloudCover
      • ํŠน์ • ์œ„์น˜์—์„œ ๊ด€์ธกํ•  ๋•Œ ๊ตฌ๋ฆ„์— ๊ฐ€๋ ค์ง„ ํ•˜๋Š˜์˜ ์ผ๋ถ€
    • windBearing
      • ๊ธฐ์ƒํ•™์—์„œ ๋ฐฉ์œ„๊ฐ 000°๋Š” ๋ฐ”๋žŒ์ด ๋ถˆ์ง€ ์•Š์„ ๋•Œ์—๋งŒ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ˜๋ฉด 360°๋Š” ๋ฐ”๋žŒ์ด ๋ถ์ชฝ์—์„œ ๋ถˆ์–ด์˜ค๋Š” ๊ฒƒ์„ ์˜๋ฏธ
      • ํŠธ๋ฃจ ๋…ธ์Šค(True North)์— ๊ด€๋ จ๋œ ๋ชจ๋“  ๋ฐฉํ–ฅ์€ "ํŠธ๋ฃจ ๋ฒ ์–ด๋ง(True Bearing)"
    • dewPoint
      • ๋ฌผ๋ฐฉ์šธ์ด ์‘์ถ•๋˜๊ธฐ ์‹œ์ž‘ํ•˜๊ณ  ์ด์Šฌ์ด ํ˜•์„ฑ๋  ์ˆ˜ ์žˆ๋Š” ๋Œ€๊ธฐ ์˜จ๋„(์••๋ ฅ๊ณผ ์Šต๋„์— ๋”ฐ๋ผ ์ธก์ •)
    • precipProbability
      • ์ง€์ •๋œ ์˜ˆ์ธก ๊ธฐ๊ฐ„ ๋ฐ ์œ„์น˜ ๋‚ด์—์„œ ์ตœ์†Œ ๊ฐ•์ˆ˜๋Ÿ‰์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ ์˜ ์ธก์ •
    • precipIntensity
      • ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋‚ด๋ฆฌ๋Š” ๋น„์˜ ์–‘์„ ์ธก์ •ํ•˜๋Š” ๊ฒƒ
 
 

3๏ธโƒฃ ์ „์ฒ˜๋ฆฌ

df.columns
df.columns = [i.replace(' [kW]', '') for i in df.columns]
  • ๋”ํ•˜๊ฑฐ๋‚˜ ํ•„์š”์—†๋Š” ์• ๋“ค drop
df['Furnace'] = df[['Furnace 1','Furnace 2']].sum(axis=1)
df['Kitchen'] = df[['Kitchen 12','Kitchen 14','Kitchen 38']].sum(axis=1)
df.drop(['Furnace 1','Furnace 2','Kitchen 12','Kitchen 14','Kitchen 38','icon','summary'], axis=1, inplace=True)

  • nan ๊ฐ’ drop
df[df.isnull().any(axis=1)]

๋งˆ์ง€๋ง‰ ํ–‰ ์นœ๊ตฌ๊ฐ€ nan ์ด ์žˆ๋‹ค

df = df[0:-1]

  • ์ž˜๋ชป๋œ ๊ฐ’๋“ค์ด ๋ˆ„์ ๋˜์–ด ์žˆ์Œ
df['cloudCover'].unique()

df[df['cloudCover']=='cloudCover'].shape

df['cloudCover'].replace(['cloudCover'], method='bfill', inplace=True)
df['cloudCover'] = df['cloudCover'].astype('float')

 

 

4๏ธโƒฃ datetime information

  • 1๋ถ„์˜ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์œผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜์ง‘๋˜์—ˆ์ง€๋งŒ ์‹œ๊ฐ„ ๋‹จ๊ณ„๊ฐ€ ์ดˆ ๋‹จ์œ„๋กœ ์ฆ๊ฐ€
pd.to_datetime(df['time'], unit='s').head(3)

  • ๋ช‡ ๋ถ„ ๋‹จ์œ„๋กœ ์ƒˆ๋กœ์šด ๋‚ ์งœ ๋ฒ”์œ„๋ฅผ ๋งŒ๋“ ๋‹ค
df['time'] = pd.DatetimeIndex(pd.date_range('2016-01-01 05:00', periods=len(df),  freq='min'))
df.head(3)

 

  • EDA ๋ฐ ๋ชจ๋ธ๋ง ๋‹จ๊ณ„์—์„œ ๋…„, ์›”, ์ผ ๋“ฑ์˜ ๋‚ ์งœ ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋ ค๋ฉด ์‹œ๊ฐ„ ์—ด์—์„œ ์ถ”์ถœ
df['year'] = df['time'].apply(lambda x : x.year)
df['month'] = df['time'].apply(lambda x : x.month)
df['day'] = df['time'].apply(lambda x : x.day)
df['weekday'] = df['time'].apply(lambda x : x.day_name())
df['weekofyear'] = df['time'].apply(lambda x : x.weekofyear)
df['hour'] = df['time'].apply(lambda x : x.hour)
df['minute'] = df['time'].apply(lambda x : x.minute)
df.head(3)

 

 

 

5๏ธโƒฃ Timing information

  • Night : 22:00 - 23:59 / 00:00 - 03:59
  • Morning : 04:00 - 11:59
  • Afternoon : 12:00 - 16:59
  • Evening : 17:00 - 21:59
def hours2timing(x):
    if x in [22,23,0,1,2,3]:
        timing = 'Night'
    elif x in range(4, 12):
        timing = 'Morning'
    elif x in range(12, 17):
        timing = 'Afternoon'
    elif x in range(17, 22):
        timing = 'Evening'
    else:
        timing = 'X'
    return timing
df['timing'] = df['hour'].apply(hours2timing)
df.head(3)

 

 

6๏ธโƒฃ Removing Duplicate Columns

fig = plt.subplots(figsize=(10, 8)) 
corr = df.corr()
sns.heatmap(corr[corr>0.9], vmax=1, vmin=-1, center=0)
plt.show()

  • 'use' - 'house allother'์™€ 'gen'๊ณผ 'solar' columns' ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๊ฑฐ์˜ 0.95๋ฅผ ๋„˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ์ปฌ๋Ÿผ๋“ค์„ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์œผ๋กœ ํ•ฉ์น  ํ•„์š”๊ฐ€ ์žˆ์Œ
df['use_HO'] = df['use']
df['gen_Sol'] = df['gen']
df.drop(['use','House overall','gen','Solar'], axis=1, inplace=True)
df.head(3)

 

 

7๏ธโƒฃ EDA

  • House Appliances
use = hv.Distribution(df['use_HO']).opts(title="Total Energy Consumption Distribution", color="red")
gen = hv.Distribution(df['gen_Sol']).opts(title="Total Energy Generation Distribution", color="blue")
(use + gen).opts(opts.Distribution(xlabel="Energy Consumption", ylabel="Density", xformatter='%.1fkw', width=400, height=300,tools=['hover'],show_grid=True))
dw = hv.Distribution(df[df['Dishwasher']<1.5]['Dishwasher'],label="Dishwasher").opts(color="red")
ho = hv.Distribution(df[df['Home office']<1.5]['Home office'],label="Home office").opts(color="blue")
fr = hv.Distribution(df[df['Fridge']<1.5]['Fridge'],label="Fridge Distribution").opts(color="orange")
wc = hv.Distribution(df[df['Wine cellar']<1.5]['Wine cellar'],label="Wine cellar").opts(color="green")
gd = hv.Distribution(df[df['Garage door']<1.5]['Garage door'],label="Garage door").opts(color="purple")
ba = hv.Distribution(df[df['Barn']<1.5]['Barn'],label="Barn").opts(color="grey")
we = hv.Distribution(df[df['Well']<1.5]['Well'],label="Well").opts(color="pink")
mcr = hv.Distribution(df[df['Microwave']<1.5]['Microwave'],label="Microwave").opts(color="yellow")
lr = hv.Distribution(df[df['Living room']<1.5]['Living room'],label="Living room").opts(color="brown")
fu = hv.Distribution(df[df['Furnace']<1.5]['Furnace'],label="Furnace").opts(color="skyblue")
ki = hv.Distribution(df[df['Kitchen']<1.5]['Kitchen'],label="Kitchen").opts(color="lightgreen")

(dw * ho * fr * wc * gd * ba * we * mcr * lr * fu * ki).opts(opts.Distribution(xlabel="Energy Consumption", ylabel="Density", xformatter='%.1fkw',title='Energy Consumption of Appliances Distribution', 
                    width=800, height=350,tools=['hover'],show_grid=True))

 

  • Weather Information
temp = hv.Distribution(df['temperature'],label="temperature").opts(color="red")
apTemp = hv.Distribution(df['apparentTemperature'],label="apparentTemperature").opts(color="orange")
temps = (temp * apTemp).opts(opts.Distribution(title='Temperature Distribution')).opts(legend_position='top',legend_cols=2)
hmd = hv.Distribution(df['humidity']).opts(color="yellow", title='Humidity Distribution')
vis = hv.Distribution(df['visibility']).opts(color="blue", title='Visibility Distribution')
prs = hv.Distribution(df['pressure']).opts(color="green", title='Pressure Distribution')
wnd = hv.Distribution(df['windSpeed']).opts(color="purple", title='WindSpeed Distribution')
cld = hv.Distribution(df['cloudCover']).opts(color="grey", title='CloudCover Distribution')
prc = hv.Distribution(df['precipIntensity']).opts(color="skyblue", title='PrecipIntensity Distribution')
dew = hv.Distribution(df['dewPoint']).opts(color="lightgreen", title='DewPoint Distribution')

(temps + hmd + vis + prs + wnd + cld + prc + dew).opts(opts.Distribution(xlabel="Values", ylabel="Density", width=400, height=300,tools=['hover'],show_grid=True)).cols(4)

 

 

8๏ธโƒฃ Time Series Analysis

  • ์—๋„ˆ์ง€ ์†Œ๋น„๋Š” 7์›”๋ถ€ํ„ฐ 9์›”๊นŒ์ง€ ์ตœ๊ณ ์กฐ์— ๋‹ฌํ•จ
  • ์—๋„ˆ์ง€ ์„ธ๋Œ€๋Š” ํฐ ์ •์ ์ด ์—†์ง€๋งŒ 1์›”๋ถ€ํ„ฐ 7์›”๊นŒ์ง€ ์ ์ฐจ ์ƒ์Šนํ•˜๋‹ค๊ฐ€ ์„œ์„œํžˆ ํ•˜๋ฝ
def groupByMonth(col):
    return df[[col,'month']].groupby('month').agg({col:['mean']})[col]
def groupByWeekday(col):
    weekdayDf = df.groupby('weekday').agg({col:['mean']})
    weekdayDf.columns = [f"{i[0]}_{i[1]}" for i in weekdayDf.columns]
    weekdayDf['week_num'] = [['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'].index(i) for i in weekdayDf.index]
    weekdayDf.sort_values('week_num', inplace=True)
    weekdayDf.drop('week_num', axis=1, inplace=True)
    return weekdayDf
def groupByTiming(col):
    timingDf = df.groupby('timing').agg({col:['mean']})
    timingDf.columns = [f"{i[0]}_{i[1]}" for i in timingDf.columns]
    timingDf['timing_num'] = [['Morning','Afternoon','Evening','Night'].index(i) for i in timingDf.index]
    timingDf.sort_values('timing_num', inplace=True)
    timingDf.drop('timing_num', axis=1, inplace=True)
    return timingDf
df = df.set_index(df['time'])
use = hv.Curve(df['use_HO'].resample('D').mean()).opts(title="Total Energy Consumption Time-Series by Day", color="red", ylabel="Energy Consumption")
gen = hv.Curve(df['gen_Sol'].resample('D').mean()).opts(title="Total Energy Generation Time-Series by Day", color="blue", ylabel="Energy Generation")
(use + gen).opts(opts.Curve(xlabel="Day", yformatter='%.1fkw', width=400, height=300,tools=['hover'],show_grid=True,fontsize={'title':11}))

 

 

use = hv.Curve(groupByMonth('use_HO')).opts(title="Total Energy Consumption Time-Series by Month", color="red", ylabel="Energy Consumption")
gen = hv.Curve(groupByMonth('gen_Sol')).opts(title="Total Energy Generation Time-Series by Month", color="blue", ylabel="Energy Generation")
(use + gen).opts(opts.Curve(xlabel="Month", yformatter='%.1fkw', width=400, height=300,tools=['hover'],show_grid=True,fontsize={'title':10})).opts(shared_axes=False)

 

  • ์ง๊ด€์ ์œผ๋กœ ์—๋„ˆ์ง€ ์†Œ๋น„์™€ ๋ฐœ์ „์˜ ์ฃผ๊ฐ„ ์ถ”์„ธ๋Š” ์—†๋‹ค
  • ํ˜„์‹ค์ ์œผ๋กœ ์•ฝ๊ฐ„์˜ ์ถ”์„ธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ๊ฐ€์น˜์˜ ๋ณ€ํ™”๋Š” ๋ฌด์‹œ ๊ฐ€๋Šฅ
use = hv.Curve(groupByWeekday('use_HO')).opts(title="Total Energy Consumption Time-Series by Weekday", color="red", ylabel="Energy Consumption")
gen = hv.Curve(groupByWeekday('gen_Sol')).opts(title="Total Energy Generation Time-Series by Weekday", color="blue", ylabel="Energy Generation")
(use + gen).opts(opts.Curve(xlabel="Weekday", yformatter='%.2fkw', width=400, height=300,tools=['hover'],show_grid=True, xrotation=20,fontsize={'title':10})).opts(shared_axes=False)

์†Œ๋น„์™€ ๋ฐœ์ „

  • ์—๋„ˆ์ง€ ์†Œ๋น„๋Š” ๋‚ฎ์—๋Š” ๋‚ฎ๊ณ  ๋ฐค์—๋Š” ๋†’์Œ
  • ์—๋„ˆ์ง€ ์ƒ์„ฑ์€ ๋‚ฎ์—๋Š” ๋†’๊ณ  ๋ฐค์—๋Š” ๋‚ฎ์Œ
    • ๋‚ฎ์—๋Š” ์ง‘์— ์ฃผ๋ฏผ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์—๋„ˆ์ง€ ๋ฐœ์ „์ด ์ด‰์ง„
    • ๋ฐค์—๋Š” ์ฃผ๋ฏผ์ด ๊ท€๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์†Œ๋น„๊ฐ€ ์ฆ๊ฐ€
use = hv.Curve(groupByTiming('use_HO')).opts(title="Total Energy Consumption Time-Series by Timing", color="red", ylabel="Energy Consumption")
gen = hv.Curve(groupByTiming('gen_Sol')).opts(title="Total Energy Generation Time-Series by Timing", color="blue", ylabel="Energy Generation")
(use + gen).opts(opts.Curve(xlabel="Timing", yformatter='%.1fkw', width=400, height=300,tools=['hover'],show_grid=True,fontsize={'title':10})).opts(shared_axes=False)

  • ํ™ˆ ์˜คํ”ผ์Šค, ๋ƒ‰์žฅ๊ณ , ์™€์ธ ์…€๋Ÿฌ, ๊ฑฐ์‹ค ๋ฐ ๊ฐ€๊ตฌ์—๋Š” ๋ถ„๋ช…ํžˆ ์‹œ๊ณ„์—ด ํŠธ๋ Œ๋“œ ์กด์žฌ
    • ๊ฐ€์ „์ œํ’ˆ์€ ๊ณ„์ ˆ์— ๋”ฐ๋ผ ์‹ค๋‚ด ์˜จ๋„๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๊ฑฐ๋‚˜ ํŽธ์•ˆํ•œ ์˜จ๋„๋กœ ์กฐ์ ˆํ•ด์•ผ ๋˜๊ธฐ ๋•Œ๋ฌธ
dw = hv.Curve(df['Dishwasher'].resample('D').mean(),label="Dishwasher Time-Series by Day").opts(color="red")
ho = hv.Curve(df['Home office'].resample('D').mean(),label="Home office Time-Series by Day").opts(color="blue")
fr = hv.Curve(df['Fridge'].resample('D').mean(),label="Fridge Time-Series by Day").opts(color="orange")
wc = hv.Curve(df['Wine cellar'].resample('D').mean(),label="Wine cellar Time-Series by Day").opts(color="green")
gd = hv.Curve(df['Garage door'].resample('D').mean(),label="Garage door Time-Series by Day").opts(color="purple")
ba = hv.Curve(df['Barn'].resample('D').mean(),label="Barn Time-Series by Day").opts(color="grey")
we = hv.Curve(df['Well'].resample('D').mean(),label="Well Time-Series by Day").opts(color="pink")
mcr = hv.Curve(df['Microwave'].resample('D').mean(),label="Microwave Time-Series by Day").opts(color="yellow")
lr = hv.Curve(df['Living room'].resample('D').mean(),label="Living room Time-Series by Day").opts(color="brown")
fu = hv.Curve(df['Furnace'].resample('D').mean(),label="Furnace Time-Series by Day").opts(color="skyblue")
ki = hv.Curve(df['Kitchen'].resample('D').mean(),label="Kitchen Time-Series by Day").opts(color="lightgreen")

(dw + ho + fr + wc + gd + ba + we + mcr + lr + fu + ki).opts(opts.Curve(xlabel="Day", ylabel="Energy Consumption", yformatter='%.2fkw' , \
                                                                               width=400, height=300,tools=['hover'],show_grid=True)).cols(6)

๊ฐ€์ „์ œํ’ˆ

  • ๋‹ฌ๋งˆ๋‹ค์˜ ๊ฐ€์ „์ œํ’ˆ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰
dw = hv.Curve(groupByMonth('Dishwasher'),label="Dishwasher Time-Series by Month").opts(color="red")
ho = hv.Curve(groupByMonth('Home office'),label="Home office Time-Series by Month").opts(color="blue")
fr = hv.Curve(groupByMonth('Fridge'),label="Fridge Time-Series by Month").opts(color="orange")
wc = hv.Curve(groupByMonth('Wine cellar'),label="Wine cellar Time-Series by Month").opts(color="green")
gd = hv.Curve(groupByMonth('Garage door'),label="Garage door Time-Series by Month").opts(color="purple")
ba = hv.Curve(groupByMonth('Barn'),label="Barn Time-Series by Month").opts(color="grey")
we = hv.Curve(groupByMonth('Well'),label="Well Time-Series by Month").opts(color="pink")
mcr = hv.Curve(groupByMonth('Microwave'),label="Microwave Time-Series by Month").opts(color="yellow")
lr = hv.Curve(groupByMonth('Living room'),label="Living room Time-Series by Month").opts(color="brown")
fu = hv.Curve(groupByMonth('Furnace'),label="Furnace Time-Series by Month").opts(color="skyblue")
ki = hv.Curve(groupByMonth('Kitchen'),label="Kitchen Time-Series by Month").opts(color="lightgreen")

(dw + ho + fr + wc + gd + ba + we + mcr + lr + fu + ki).opts(opts.Curve(xlabel="Month", ylabel="Energy Consumption", yformatter='%.2fkw', \
                                                                               width=400, height=300,tools=['hover'],show_grid=True)).opts(shared_axes=False).cols(6)

  • ๊ฐ€์ „์ œํ’ˆ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์—๋Š” ์ฃผ๊ฐ„ ์ถ”์„ธ๊ฐ€ ์—†๋‹ค.
dw = hv.Curve(groupByWeekday('Dishwasher'),label="Dishwasher Time-Series by Weekday").opts(color="red")
ho = hv.Curve(groupByWeekday('Home office'),label="Home office Time-Series by Weekday").opts(color="blue")
fr = hv.Curve(groupByWeekday('Fridge'),label="FridgeTime-Series by Weekday").opts(color="orange")
wc = hv.Curve(groupByWeekday('Wine cellar'),label="Wine cellar Time-Series by Weekday").opts(color="green")
gd = hv.Curve(groupByWeekday('Garage door'),label="Garage door Time-Series by Weekday").opts(color="purple")
ba = hv.Curve(groupByWeekday('Barn'),label="Barn Time-Series by Weekday").opts(color="grey")
we = hv.Curve(groupByWeekday('Well'),label="Well Time-Series by Weekday").opts(color="pink")
mcr = hv.Curve(groupByWeekday('Microwave'),label="Microwave Time-Series by Weekday").opts(color="yellow")
lr = hv.Curve(groupByWeekday('Living room'),label="Living room Time-Series by Weekday").opts(color="brown")
fu = hv.Curve(groupByWeekday('Furnace'),label="Furnace Time-Series by Weekday").opts(color="skyblue")
ki = hv.Curve(groupByWeekday('Kitchen'),label="Kitchen Time-Series by Weekday").opts(color="lightgreen")

(dw + ho + fr + wc + gd + ba + we + mcr + lr + fu + ki).opts(opts.Curve(xlabel="Weekday", ylabel="Energy Consumption", yformatter='%.2fkw', \
                                                                               width=400, height=300,tools=['hover'],show_grid=True, xrotation=20)).opts(shared_axes=False).cols(6)

  • ์ „์ฒด์ ์œผ๋กœ ์ €๋…๋ถ€ํ„ฐ ๋ฐค์‚ฌ์ด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์ด ์†Œํญ ์ฆ๊ฐ€
    • ์ฃผ๋ฏผ๋“ค์ด ์ง์žฅ์—์„œ ๋Œ์•„์™€ ์ƒ์‚ฐ์ ์ธ ํ™œ๋™์„ ์‹œ์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ
dw = hv.Curve(groupByTiming('Dishwasher'),label="Dishwasher Time-Series by Timing").opts(color="red")
ho = hv.Curve(groupByTiming('Home office'),label="Home office Time-Series by Timing").opts(color="blue")
fr = hv.Curve(groupByTiming('Fridge'),label="FridgeTime-Series by Timing").opts(color="orange")
wc = hv.Curve(groupByTiming('Wine cellar'),label="Wine cellar Time-Series by Timing").opts(color="green")
gd = hv.Curve(groupByTiming('Garage door'),label="Garage door Time-Series by Timing").opts(color="purple")
ba = hv.Curve(groupByTiming('Barn'),label="Barn Time-Series by Timing").opts(color="grey")
we = hv.Curve(groupByTiming('Well'),label="Well Time-Series by Timing").opts(color="pink")
mcr = hv.Curve(groupByTiming('Microwave'),label="Microwave Time-Series by Timing").opts(color="yellow")
lr = hv.Curve(groupByTiming('Living room'),label="Living room Time-Series by Timing").opts(color="brown")
fu = hv.Curve(groupByTiming('Furnace'),label="Furnace Time-Series by Timing").opts(color="skyblue")
ki = hv.Curve(groupByTiming('Kitchen'),label="Kitchen Time-Series by Timing").opts(color="lightgreen")

(dw + ho + fr + wc + gd + ba + we + mcr + lr + fu + ki).opts(opts.Curve(xlabel="Timing", ylabel="Energy Consumption", yformatter='%.2fkw', \
                                                                               width=400, height=300,tools=['hover'],show_grid=True)).opts(shared_axes=False).cols(6)

 

 

9๏ธโƒฃ Weather Time-Series

temp = hv.Curve(df['temperature'].resample('D').mean(),label="temperature").opts(color="red")
apTemp = hv.Curve(df['apparentTemperature'].resample('D').mean(),label="apparentTemperature").opts(color="orange")
temps = (temp * apTemp).opts(opts.Curve(title='Temperature Time-Series by Day')).opts(legend_position='top',legend_cols=2)
hmd = hv.Curve(df['humidity'].resample('D').mean()).opts(color="yellow", title='Humidity Time-Series by Day')
vis = hv.Curve(df['visibility'].resample('D').mean()).opts(color="blue", title='Visibility Time-Series by Day')
prs = hv.Curve(df['pressure'].resample('D').mean()).opts(color="green", title='Pressure Time-Series by Day')
wnd = hv.Curve(df['windSpeed'].resample('D').mean()).opts(color="purple", title='WindSpeed Time-Series by Day')
cld = hv.Curve(df['cloudCover'].resample('D').mean()).opts(color="grey", title='CloudCover Time-Series by Day')
prc = hv.Curve(df['precipIntensity'].resample('D').mean()).opts(color="skyblue", title='PrecipIntensity Time-Series by Day')
dew = hv.Curve(df['dewPoint'].resample('D').mean()).opts(color="lightgreen", title='DewPoint Time-Series by Day')

(temps + hmd + vis + prs + wnd + cld + prc + dew).opts(opts.Curve(xlabel="Day", ylabel="Values", width=400, height=300,tools=['hover'],show_grid=True)).cols(4)

 

 

 

๐Ÿ”Ÿ Correlation Analysis

  • ๊ฐ€์ „๋ผ๋ฆฌ๋Š” ์•„๋ฌด ๊ด€๊ณ„ ์—†๋‹ค
fig,ax = plt.subplots(figsize=(10, 8)) 
corr = df[['Dishwasher','Home office','Fridge','Wine cellar','Garage door','Barn','Well','Microwave','Living room','Furnace','Kitchen']].corr()
sns.heatmap(corr, annot=True, vmin=-1.0, vmax=1.0, center=0)
ax.set_title('Correlation of Appliances',size=20)
plt.show()

 

  • ๋‚ ์”จ์™€์˜ ์ƒ๊ด€๊ด€๊ณ„
    • ์˜จ๋„๋Š” ๊ฒ‰๋ณด๊ธฐ ์˜จ๋„ ๋ฐ ์ด์Šฌ์ ๊ณผ ๊ด€๋ จ
    • ์Šต๋„๋Š” ๊ฐ€์‹œ์„ฑ, ํ’์†, ๊ตฌ๋ฆ„ ๋ฎ๊ฐœ ๋ฐ ์ด์Šฌ์ ๊ณผ ๊ด€๋ จ
    • ๊ฐ€์‹œ์„ฑ์€ ์Šต๋„, ํ’์†, ๊ตฌ๋ฆ„ ๋ฎ๊ฐœ ๋ฐ ๊ฐ•์ˆ˜๋Ÿ‰๊ณผ ๊ด€๋ จ
    • CloudCover๋Š” ์Šต๋„, ๊ฐ€์‹œ์„ฑ ๋ฐ ๊ฐ•์ˆ˜๋Ÿ‰๊ณผ ๊ด€๋ จ
    • ๊ฐ•์ˆ˜ ๊ฐ•๋„๋Š” ๊ฐ€์‹œ์„ฑ ๋ฐ ๊ตฌ๋ฆ„ ๋ฎ๊ฐœ์™€ ๊ด€๋ จ
    • DewPoint๋Š” ์˜จ๋„, ๋ช…๋ฐฑํ•œ ์˜จ๋„ ๋ฐ ์Šต๋„์™€ ๊ด€๋ จ
fig,ax = plt.subplots(figsize=(10, 8)) 
corr = df[['temperature','apparentTemperature','humidity','visibility','pressure','windSpeed','cloudCover','precipIntensity','dewPoint']].corr()
sns.heatmap(corr, annot=True, vmin=-1.0, vmax=1.0, center=0)
ax.set_title('Correlation of Weather Information',size=20)
plt.show()

  • ์ผ๋ถ€ ๊ฐ€์ „์ œํ’ˆ์€ ๋‚ ์”จ ์ •๋ณด์˜ ์˜ํ–ฅ ๋ฐ›์Œ
    • ๋ƒ‰์žฅ๊ณ ๋Š” ์˜จ๋„, ์™ธ๊ด€์ƒ ์˜จ๋„ ๋ฐ ์ด์Šฌ์ ๊ณผ ๊ด€๋ จ
    • ์™€์ธ ์ €์žฅ๊ณ ๋Š” ์˜จ๋„, ์™ธ๊ด€์ƒ ์˜จ๋„ ๋ฐ ์ด์Šฌ์ ๊ณผ ๊ด€๋ จ
    • ์šฉํ•ด๋กœ๋Š” ์˜จ๋„, ์™ธ๊ด€์ƒ ์˜จ๋„, ํ’์† ๋ฐ ์ด์Šฌ์ ๊ณผ ๊ด€๋ จ
fig,ax = plt.subplots(figsize=(20, 12)) 
corr = df[['use_HO','gen_Sol','Dishwasher','Home office','Fridge','Wine cellar','Garage door','Barn','Well','Microwave','Living room','Furnace','Kitchen',\
           'temperature','apparentTemperature','humidity','visibility','pressure','windSpeed','cloudCover','precipIntensity','dewPoint']].corr()
sns.heatmap(corr, annot=True, vmin=-1.0, vmax=1.0, center=0)
ax.set_title('Correlation of Appliances & Weather Information',size=20)
plt.show()

 

 

 

1๏ธโƒฃ1๏ธโƒฃ Model

๐ŸŸฃ ๋ณ€๊ฒฝ ๊ฐ์ง€ : ๊ณผ๋„ํ•œ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์„ ์‚ฌ์ „์— ๊ฐ์ง€ํ•˜์—ฌ ์‚ฌ์šฉ๋ฃŒ ์ธ์ƒ์„ ๋ฐฉ์ง€
๐ŸŸฃ ๋ฏธ๋ž˜์†Œ๋น„ ์˜ˆ์ธก : ๊ธฐ์ƒ์ •๋ณด ํ™œ์šฉ ๋ฐ ์—๋„ˆ์ง€ ๊ณต๊ธ‰ ์ตœ์ ํ™”๋กœ ๋ฏธ๋ž˜ ์—๋„ˆ์ง€ ์†Œ๋น„ ๋ฐ ๋ฐœ์ „ ์˜ˆ์ธก

 

โ–ถ ๐ŸŸฃ Case1. Detect Changes in Energy Consumption

  • ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ๋ฐ์ดํ„ฐ์—์„œ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ์‚ฌ์šฉ ๊ฒฝํ–ฅ์˜ ๋ณ€ํ™”์ ์„ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ
    • ์†Œ๋น„ ํŠธ๋ Œ๋“œ์˜ ๋ณ€ํ™”๋ฅผ ํฌ์ฐฉํ•จ์œผ๋กœ์จ ์†Œ๋น„๊ฐ€ ์ฆ๊ฐ€ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ๋‹ฌ์— ์—๋„ˆ์ง€ ๊ณต๊ธ‰์„ ๋Š˜๋ฆฌ๊ณ 
    • ๊ฐ์†Œํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ๋‹ฌ์— ์—๋„ˆ์ง€ ๊ณต๊ธ‰์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐ

 

  • change point
    • ๋ณ€๊ฒฝ ์ง€์ ์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์ถ”์„ธ๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š” ์ง€์ 
    • ํŠน์ด์น˜๋Š” ์ˆœ๊ฐ„์ ์ธ ์ด์ƒ ์ƒํƒœ(๊ธ‰๊ฒฉํ•œ ๊ฐ์†Œ ๋˜๋Š” ์ฆ๊ฐ€)๋ฅผ ๋‚˜ํƒ€๋ƒ„
    • ๋ณ€ํ™”์ ์€ ์ด์ƒ ์ƒํƒœ๊ฐ€ ์›๋ž˜ ์ƒํƒœ๋กœ ๋Œ์•„๊ฐ€์ง€ ์•Š๊ณ  ๊ณ„์†๋œ๋‹ค

 

  • ChangeFinder
    • ๋ณ€๊ฒฝ ์ง€์ ์„ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
    • SDAR(Sequency Discounting AR) ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•œ ๋กœ๊ทธ ์šฐ๋„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€๊ฒฝ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐ
    • SDAR ์•Œ๊ณ ๋ฆฌ๋“ฌ์€ AR ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ํ• ์ธ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ๋„์ž…ํ•˜์—ฌ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์˜ ์˜ํ–ฅ์„ ์ค„์ž„์œผ๋กœ์จ ์ •์ง€ํ•˜์ง€ ์•Š์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋„ ๊ฐ•๋ ฅํ•˜๊ฒŒ ํ•™์Šต
      • Training STEP1
        • SDAR ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ฐ์ดํ„ฐ ์ง€์ ์—์„œ ์‹œ๊ณ„์—ด ๋ชจ๋ธ ๊ต์œก
        • ํ›ˆ๋ จ๋œ ์‹œ๊ณ„์—ด ๋ชจํ˜•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ ์ ์ด ๋‚˜ํƒ€๋‚  ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ„์‚ฐ
        • ๋กœ๊ทธ ์†์‹ค์„ ๊ณ„์‚ฐํ•˜์—ฌ ํŠน์ด์น˜ ์ ์ˆ˜๋กœ ์‚ฌ์šฉ
        • Score(xt)=−logPt−1(xt|x1,x2,…,xt−1)
      • Smoothing Step
        • smoothing window(WW) ๋‚ด์—์„œ ํŠน์ด์น˜ ์ ์ˆ˜๋ฅผ ํ‰ํ™œ
        • ํ‰ํ™œํ™”๋ฅผ ํ†ตํ•ด ํŠน์ด์น˜๋กœ ์ธํ•œ ์ ์ˆ˜๊ฐ€ ๊ฐ์‡ ๋˜๋ฉฐ, ์ด์ƒ ์ƒํƒœ๊ฐ€ ์˜ค๋žซ๋™์•ˆ ์ง€์†๋˜์—ˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํ™•์ธ
        • Score_smoothed(xt)=1W∑t=t−W+1tScore(xi)
      • Training STEP2
        • Smoothing ์„ ํ†ตํ•ด ์–ป์€ ์ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ SDAR ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ชจ๋ธ์„ ๊ต์œก
        • ํ›ˆ๋ จ๋œ ์‹œ๊ณ„์—ด ๋ชจํ˜•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ ์ ์ด ๋‚˜ํƒ€๋‚  ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ„์‚ฐ
        • ๋กœ๊ทธ ์†์‹ค์„ ๊ณ„์‚ฐํ•˜์—ฌ ๋ณ€๊ฒฝ ์ ์ˆ˜๋กœ ์‚ฌ์šฉ

 

 

  • Hyperparameter Tuning
    • Discounting parameter r(0<r<1)r(0<r<1) : ์ด ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์˜ํ–ฅ๋ ฅ์€ ์ปค์ง€๋ฉฐ ๋ณ€๊ฒฝ์ ์ˆ˜์˜ ๋ณ€๋™์€ ์ปค์ง
    • Order parameter for AR orderorder : ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋ชจํ˜•์— ์–ผ๋งˆ๋‚˜ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€
    • Smoothing window smoothsmooth : ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํด์ˆ˜๋ก ํŠน์ด์น˜๋ณด๋‹ค ๋ณธ์งˆ์ ์ธ ๋ณ€ํ™”๋ฅผ ํฌ์ฐฉํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ ๋„ˆ๋ฌด ํด ๊ฒฝ์šฐ ๋ณ€๊ฒฝ ๋‚ด์šฉ ์ž์ฒด๋ฅผ ํฌ์ฐฉํ•˜๊ธฐ ์–ด๋ ค์›€

 

def chng_detection(col, _r=0.01, _order=1, _smooth=10):
    cf = changefinder.ChangeFinder(r=_r, order=_order, smooth=_smooth)
    ch_df = pd.DataFrame()
    ch_df[col] = df[col].resample('D').mean()
    # calculate the change score
    ch_df['change_score'] = [cf.update(i) for i in ch_df[col]]
    ch_score_q1 = stats.scoreatpercentile(ch_df['change_score'], 25) 
    ch_score_q3 = stats.scoreatpercentile(ch_df['change_score'], 75) 
    thr_upper = ch_score_q3 + (ch_score_q3 - ch_score_q1) * 3
    
    anom_score = hv.Curve(ch_df['change_score'])
    anom_score_th = hv.HLine(thr_upper).opts(color='red', line_dash="dotdash")
    
    anom_points = [[ch_df.index[i],ch_df[col][i]] for i, score in enumerate(ch_df["change_score"]) if score > thr_upper]
    org = hv.Curve(ch_df[col],label=col).opts(yformatter='%.1fkw')
    detected = hv.Points(anom_points, label=f"{col} detected").opts(color='red', legend_position='bottom', size=5)

    return ((anom_score * anom_score_th).opts(title=f"{col} Change Score & Threshold") + \
            (org * detected).opts(title=f"{col} Detected Points")).opts(opts.Curve(width=800, height=300, show_grid=True, tools=['hover'])).cols(1)
  • ์—๋„ˆ์ง€ ์†Œ๋น„์˜ ๋ณ€ํ™”์ ์„ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ตฌ์ถ•
    • ๋ฐ์ดํ„ฐ ์ถ”์„ธ๊ฐ€ ๋ณ€ํ™”ํ•˜๋Š” 7์›”(๊ธ‰์ฆ)๊ณผ 9์›”(๊ธ‰๊ฐ)์˜ ๋ณ€ํ™”์ ์„ ํฌ์ฐฉ
chng_detection('use_HO', _r=0.001, _order=1, _smooth=3)

 

 

โ–ถ ๐ŸŸฃ Case2. Predict Future Energy Consumption

  • ๊ธฐ์ƒ ์ •๋ณด๋กœ๋ถ€ํ„ฐ ๊ฐ ๊ธฐ๊ธฐ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅ
    • ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์„ ์˜ˆ์ธกํ•จ์œผ๋กœ์จ ๋‚ ์”จ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•„์š”ํ•œ ์—๋„ˆ์ง€ ๊ณต๊ธ‰๋Ÿ‰์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ์–ด ์—๋„ˆ์ง€ ์ตœ์ ํ™”๊ฐ€ ๊ฐ€๋Šฅ
    • VAR, President, LightGBM์˜ ์„ธ ๊ฐ€์ง€ ๋ชจ๋ธ

 

โœ” 1) MODEL VAR

  • ๋ฒกํ„ฐ ์ž๊ธฐ ํšŒ๊ท€(VAR) ๋ชจ๋ธ์€ ์ž๊ธฐ ํšŒ๊ท€(AR) ๋ชจ๋ธ์˜ ๋‹ค๋ณ€๋Ÿ‰ ํ™•์žฅ
  • ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•œ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋‹จ์ผ ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•œ ์˜ˆ์ธก์— ๋น„ํ•ด ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋  ๊ฒƒ
def grangerTestPlot(weather_info, applicances, _maxlag):
    grangerTest_df = pd.DataFrame()
    for weather in weather_info:
        for appliance in applicances:
            test_result = grangercausalitytests(df[[appliance, weather]], maxlag=_maxlag, verbose=False)
            p_values = [round(test_result[i][0]['ssr_chi2test'][1],4) for i in range(1, _maxlag+1)]
            min_p_value = np.min(p_values)
            grangerTest_df.loc[appliance, weather] = min_p_value

    fig,ax = plt.subplots(figsize=(10, 8)) 
    sns.heatmap(grangerTest_df, vmax=1, vmin=0, center=1, annot=True)
    ax.set_title('Granger Causality Test Result',size=20)
    plt.xlabel("Weather Information",size=15)
    plt.ylabel("Energy Consumption",size=15)
    plt.show()
  • ๊ฐ€์ • ์ „์ฒด ๋ฐ ์ผ๋ฐ˜ ๊ฐ€์ „์ œํ’ˆ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์„ ์„ ์ •ํ•˜์—ฌ ๊ธฐ์ƒ์ •๋ณด๋กœ Granger Causality Test ๋ฅผ ์‹ค์‹œ
    • ์„ ํƒ๋œ ๋‘ ์—๋„ˆ์ง€ ์†Œ๋น„ ๋ณ€์ˆ˜์—์„œ ์••๋ ฅpressure๊ณผ ๊ฐ€์‹œ์„ฑvisibility์˜ P-๊ฐ’์ด 5%๋ฅผ ์ดˆ๊ณผ
    • ์ด๋“ค ์Œ ์‚ฌ์ด์— ์ธ๊ณผ๊ด€๊ณ„๊ฐ€ ๊ด€์ฐฐ๋˜์ง€ ์•Š์•˜์Œ
grangerTestPlot(
    weather_info=['temperature', 'humidity', 'visibility', 'pressure', 'windSpeed', 'cloudCover', 'windBearing', 'precipIntensity','dewPoint'], \
    applicances=['use_HO','Wine cellar'], \
    _maxlag=12)

 

  • ๋งŽ์€ ์‹œ๊ณ„์—ด ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์ƒ์ด์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •์ƒ์„ฑ์€ ์‹œ๊ณ„์—ด ๋ชจ๋ธ๋ง์— ์ค‘์š”
    • ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ์€ ์ผ์ •
    • ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์ด ์ผ์ •
    • ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ์€ ์ผ์ •
    • ์ •์ƒ์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•ด Augmented Dickey-Fuller Test ๋ฅผ ์‚ฌ์šฉ
      • ADF ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ ๋ชจ๋“  ๋ณ€์ˆ˜์˜ P-๊ฐ’์€ 5% ์ด๋‚ด
for i in ['temperature', 'humidity','windSpeed', 'cloudCover', 'windBearing', 'precipIntensity','dewPoint','use_HO','Wine cellar']:
    print(f"p-value {i} : {adfuller(df[i].resample('H').mean(), autolag='AIC', regression = 'ct')[1]}")

 

 

  • ์„ค๋ช… ๋ณ€์ˆ˜์— ๋‚ ์”จ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•œ ์ด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰๊ณผ ์™€์ธ ์…€๋Ÿฌ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ
    • ๋‘˜ ๋‹ค ๋Œ€์ฒด๋กœ ์•„์ฃผ ์งง์€ ์‹œ๊ฐ„ ์•ˆ์— ์ž˜ ์˜ˆ์ธก
var_df = df.resample('H').mean()
def var_train(cols=['temperature', 'humidity', 'visibility',  'windSpeed', 'windBearing', 'dewPoint','Furnace', 'use_HO'], max_order=10, train_ratio=0.9,test_ratio=0.1):
    #make dataframe for training
    tr,te = [int(len(var_df) * i) for i in [train_ratio, test_ratio]]
    train, test = var_df[0:tr], var_df[tr:]
    #model training
    var_func = VAR(train[cols], freq='H')
    var_func.select_order(max_order)
    model = var_func.fit(maxlags=max_order, ic='aic', trend='ct')
    model_result = model.summary()
    #make predict dataframe
    varForecast_df = pd.DataFrame(model.forecast(model.endog, steps=len(test)),columns=cols)
    varForecast_df.index = test.index
    
    return varForecast_df, model_result

 

  • ๋ชจ๋ธ ํ‰๊ฐ€
varForecast_df, model_result = var_train(cols=['temperature', 'humidity','windSpeed', 'cloudCover', 'windBearing', 'precipIntensity','dewPoint','use_HO','Wine cellar'], \
                                         max_order=48, train_ratio=0.99,test_ratio=0.01)
#evaluation with MAE
var_use_mae = mean_absolute_error(var_df['use_HO'][-len(varForecast_df):], varForecast_df['use_HO'])
((hv.Curve(var_df['use_HO'], label='use_HO').opts(color='blue')\
  * hv.Curve(varForecast_df['use_HO'], label='use_HO predicted').opts(color='red', title='VAR Result - Total Energy Consumption')).opts(legend_position='bottom') + \
 (hv.Curve(var_df['use_HO'][-int(len(var_df)*0.05):], label='use_HO').opts(color='blue') \
  * hv.Curve(varForecast_df['use_HO'], label='use_HO predicted').opts(color='red', title='VAR Result Enlarged - Total Energy Consumption')).opts(legend_position='bottom'))\
    .opts(opts.Curve(xlabel="Time", yformatter='%.2fkw', width=800, height=300, show_grid=True, tools=['hover'])).opts(shared_axes=False).cols(1)

Total Energy

 

var_wine_mae = mean_absolute_error(var_df['Wine cellar'][-len(varForecast_df):], varForecast_df['Wine cellar'])
((hv.Curve(var_df['Wine cellar'], label='Wine cellar').opts(color='blue')\
  * hv.Curve(varForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='VAR Result - Wine Cellar Energy Consumption')).opts(legend_position='bottom') + \
 (hv.Curve(var_df['Wine cellar'][-int(len(var_df)*0.05):], label='Wine Cellar').opts(color='blue') \
  * hv.Curve(varForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='VAR Result Enlarged - Wine Cellar Energy Consumption')).opts(legend_position='bottom'))\
    .opts(opts.Curve(xlabel="Time", yformatter='%.2fkw', width=800, height=300, show_grid=True, tools=['hover'])).opts(shared_axes=False).cols(1)

Wine cellar

 

print(model_result)

 

 

โœ” 2) MODEL Prophet

prf_df = df.resample('H').mean()
def prophet_train(train_ratio=0.99, test_ratio=0.01, trg='use_HO', regressors=['temperature', 'humidity']):
    #make dataframe for training
    tr,te = [int(len(prf_df) * i) for i in [train_ratio, test_ratio]]
    train, test = prf_df[0:tr], prf_df[tr:]
    prophet_df = pd.DataFrame()
    prophet_df["ds"] = train.index
    prophet_df['y'] = train[trg].values
    #add regressors
    for i in regressors:
        prophet_df[i] = train[i].values

    #train model by Prophet
    m = Prophet()
    #include additional regressors into the model
    for i in regressors:
        m.add_regressor(i)
    m.fit(prophet_df)

    #make dataframe for prediction
    future = pd.DataFrame()
    future['ds'] = test.index
    #add regressors
    for i in regressors:
        future[i] = test[i].values

    #predict the future
    prophe_result = m.predict(future)
    prfForecast_df = pd.DataFrame()
    prfForecast_df[trg] = prophe_result.yhat
    prfForecast_df.index = prophe_result.ds
    
    return prfForecast_df
  • ์ด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์€ ๋Œ€๋žต์ ์œผ๋กœ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ์™€์ธ ์ €์žฅ๊ณ ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์€ ์•ฝ๊ฐ„๋งŒ ์˜ˆ์ธก

Total Energy

 

 

prfForecast_df = prophet_train(trg='Wine cellar',regressors=['temperature', 'humidity','windSpeed', 'cloudCover', 'windBearing', 'precipIntensity','dewPoint'])
#evaluation with MAE
prf_wine_mae = mean_absolute_error(prf_df['Wine cellar'][-len(prfForecast_df):], prfForecast_df['Wine cellar'])
((hv.Curve(prf_df['Wine cellar'], label='Wine cellar').opts(color='blue')\
  * hv.Curve(prfForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='Prophet Result - Wine cellar Energy Consumption')).opts(legend_position='bottom') + \
 (hv.Curve(prf_df['Wine cellar'][-int(len(var_df)*0.05):], label='Wine cellar').opts(color='blue') \
  * hv.Curve(prfForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='Prophet Result Enlarged - Wine cellar Energy Consumption')).opts(legend_position='bottom'))\
    .opts(opts.Curve(xlabel="Time", yformatter='%.2fkw', width=800, height=300, show_grid=True, tools=['hover'])).opts(shared_axes=False).cols(1)

Wine cellar

 

 

โœ” 3) MODEL LightGBM Regressor

  •  ์‹œ๊ณ„์—ด ํšŒ๊ท€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๋ฉด ๋ฏธ๋ž˜์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์—๋„ˆ์ง€ ์†Œ๋น„์™€ ๋‚ ์”จ ์ •๋ณด ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์ดํ•ด
_lgbm_df = df.resample('H').mean()
_lgbm_df['weekday'] =   LabelEncoder().fit_transform(pd.Series(_lgbm_df.index).apply(lambda x : x.day_name())).astype(np.int8)
_lgbm_df['timing'] = LabelEncoder().fit_transform(_lgbm_df['hour'].apply(hours2timing)).astype(np.int8)
def lgbm_train(cols=['temperature','dewPoint','use_HO'],trg='use_HO',train_ratio=0.8,valid_ratio=0.1,test_ratio=0.1):
    #make dataframe for training
    lgbm_df = _lgbm_df[cols]
    tr,vd,te = [int(len(lgbm_df) * i) for i in [train_ratio, valid_ratio, test_ratio]]
    X_train, Y_train = lgbm_df[0:tr].drop([trg], axis=1), lgbm_df[0:tr][trg]
    X_valid, Y_valid = lgbm_df[tr:tr+vd].drop([trg], axis=1), lgbm_df[tr:tr+vd][trg]
    X_test = lgbm_df[tr+vd:tr+vd+te+2].drop([trg], axis=1)
    lgb_train = lgb.Dataset(X_train, Y_train)
    lgb_valid = lgb.Dataset(X_valid, Y_valid, reference=lgb_train)
    #model training
    params = {
        'task' : 'train',
        'boosting':'gbdt',
        'objective' : 'regression',
        'metric' : {'mse'},
        'num_leaves':200,
        'drop_rate':0.05,
        'learning_rate':0.1,
        'seed':0,
        'feature_fraction':1.0,
        'bagging_fraction':1.0,
        'bagging_freq':0,
        'min_child_samples':5
    }
    gbm = lgb.train(params, lgb_train, num_boost_round=100, valid_sets=[lgb_train, lgb_valid], early_stopping_rounds=100)
    #make predict dataframe
    pre_df = pd.DataFrame()
    pre_df[trg] = gbm.predict(X_test, num_iteration=gbm.best_iteration)
    pre_df.index = lgbm_df.index[tr+vd:tr+vd+te+2]
    return pre_df, gbm, X_train

 

 

๐Ÿ”ต ์ด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์˜ ์˜ˆ์ธก

lgbmForecast_df, model, x_train = lgbm_train(\
                cols=['temperature', 'humidity', 'visibility', 'apparentTemperature',\
                       'pressure', 'windSpeed', 'cloudCover', 'windBearing', 'precipIntensity',\
                       'dewPoint', 'precipProbability','year', 'month','day', 'weekday', 'weekofyear', \
                        'hour', 'timing','use_HO'],\
                trg='use_HO',train_ratio=0.9,valid_ratio=0.09,test_ratio=0.01)
#calculate SHAP value for model interpretation
explainer = shap.TreeExplainer(model=model,feature_perturbation='tree_path_dependent')
shap_values = explainer.shap_values(X=x_train)
  • ์ด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์˜ ์˜ˆ์ธก์€ Prophet model ์˜ ๊ฒฐ๊ณผ๋ณด๋‹ค ๋” ์ •ํ™•
    • Prophet model์— ํฌํ•จ๋˜์ง€ ์•Š์•˜๋˜ 'ํ‰์ผ', 'ํƒ€์ด๋ฐ' ๋“ฑ์˜ ์‹œ๊ฐ„ ์ •๋ณด๊ฐ€ ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ์Œ
    • SHAP์˜ ํŠน์ง• ๋ถ„์„์„ ์‚ดํŽด๋ณด๋ฉด, 'week of year', 'timing', 'hour', 'ewPoint', 'thewPoint', 'temperature' ๋“ฑ์˜ ํŠน์ง•์˜ ๊ธ์ •์ ์ธ ๋ณ€ํ™”์™€ 'weekday', 'cloudCover' ๋“ฑ์˜ ๋ถ€์ •์ ์ธ ๋ณ€ํ™”๊ฐ€ ์ „์ฒด ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์˜ ์ฆ๊ฐ€์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ

 

 

  • ๋ชจ๋ธ ํ‰๊ฐ€ MAE
#evaluation with MAE
lgbm_use_mae = mean_absolute_error(_lgbm_df['use_HO'][-len(lgbmForecast_df):], lgbmForecast_df['use_HO'])
((hv.Curve(_lgbm_df['use_HO'], label='use_HO').opts(color='blue')\
  * hv.Curve(lgbmForecast_df['use_HO'], label='use_HO predicted').opts(color='red', title='LightGBM Result - Total Energy Consumption')).opts(legend_position='bottom') + \
 (hv.Curve(_lgbm_df['use_HO'][-int(len(_lgbm_df)*0.05):], label='use_HO').opts(color='blue') \
  * hv.Curve(lgbmForecast_df['use_HO'], label='use_HO predicted').opts(color='red', title='LightGBM Result Enlarged - Total Energy Consumption')).opts(legend_position='bottom'))\
    .opts(opts.Curve(xlabel="Time", yformatter='%.2fkw', width=800, height=300, show_grid=True, tools=['hover'])).opts(shared_axes=False).cols(1)

  • force_plot
shap.force_plot(base_value=explainer.expected_value, shap_values=shap_values, features=x_train, feature_names=x_train.columns)

  • summary_plot
shap.summary_plot(shap_values=shap_values, features=x_train, feature_names=x_train.columns, plot_type="violin")

 

 

๐Ÿ”ต ์™€์ธ ์…€๋Ÿฌ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„ ์˜ˆ์ธก

lgbmForecast_df, model, x_train = lgbm_train(\
                cols=['temperature', 'humidity', 'visibility', 'apparentTemperature',\
                       'pressure', 'windSpeed', 'cloudCover', 'windBearing', 'precipIntensity',\
                       'dewPoint', 'precipProbability','year', 'month','day', 'weekday', 'weekofyear', \
                        'hour', 'timing','Wine cellar'],\
                trg='Wine cellar',train_ratio=0.9,valid_ratio=0.09,test_ratio=0.01)
#calculate SHAP value for model interpretation
explainer = shap.TreeExplainer(model=model,feature_perturbation='tree_path_dependent')
shap_values = explainer.shap_values(X=x_train)
  • ์™€์ธ ์…€๋Ÿฌ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„ ์˜ˆ์ธก์€ Prophet model ์˜ ๊ฒฐ๊ณผ๋ณด๋‹ค ๋” ์ •ํ™•
    • Prophet model ์— ํฌํ•จ๋˜์ง€ ์•Š์•˜๋˜ 'week of year', 'hour' ๋“ฑ์˜ ์‹œ๊ฐ„ ์ •๋ณด๊ฐ€ ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ์Œ
    • SHAP์˜ ํŠน์ง• ๋ถ„์„์„ ์‚ดํŽด๋ณด๋ฉด, 'week of year', 'hour', 'ewPoint', 'wind Speed' ๋“ฑ์˜ ํŠน์ง•์˜ ๊ธ์ •์ ์ธ ๋ณ€ํ™”์™€ '์Šต๋„', 'cloudCover' ๋“ฑ์˜ ๋ถ€์ •์ ์ธ ๋ณ€ํ™”๊ฐ€ ์™€์ธ์…€๋Ÿฌ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ์ฆ๊ฐ€์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ

 

 

  • ๋ชจ๋ธ ํ‰๊ฐ€ MAE
#evaluation with MAE
lgbm_wine_mae = mean_absolute_error(_lgbm_df['Wine cellar'][-len(lgbmForecast_df):], lgbmForecast_df['Wine cellar'])
((hv.Curve(_lgbm_df['Wine cellar'], label='Wine cellar').opts(color='blue')\
  * hv.Curve(lgbmForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='LightGBM Result - Wine cellar Energy Consumption')).opts(legend_position='bottom') + \
 (hv.Curve(_lgbm_df['Wine cellar'][-int(len(_lgbm_df)*0.05):], label='use_HO').opts(color='blue') \
  * hv.Curve(lgbmForecast_df['Wine cellar'], label='Wine cellar predicted').opts(color='red', title='LightGBM Result Enlarged - Wine cellar Energy Consumption')).opts(legend_position='bottom'))\
    .opts(opts.Curve(xlabel="Time", yformatter='%.2fkw', width=800, height=300, show_grid=True, tools=['hover'])).opts(shared_axes=False).cols(1)

shap.force_plot(base_value=explainer.expected_value, shap_values=shap_values, features=x_train, feature_names=x_train.columns)

shap.summary_plot(shap_values=shap_values, features=x_train, feature_names=x_train.columns, plot_type="violin")

 

 

1๏ธโƒฃ2๏ธโƒฃ Evaluation - MAE

display(HTML('<h3>Evaluation - MAE</h3>'+tabulate([['Total Energy Consumption',var_use_mae,prf_use_mae,lgbm_use_mae],['Wine cellar Energy Consumption',var_wine_mae,prf_wine_mae,lgbm_wine_mae]],\
                      ["Target", "VAR", "Prophet","LightGBM Regressor"], tablefmt="html")))

 

1๏ธโƒฃ3๏ธโƒฃ Conclusions

  • ๊ฐ ๊ฐ€์ „์ œํ’ˆ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์—๋Š” ์ผ์ •ํ•œ ๊ฒฝํ–ฅ์ด ์žˆ์Œ
  • ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ChangeFinder ๋ชจ๋ธ์€ ์—๋„ˆ์ง€ ์†Œ๋น„์˜ ํŠธ๋ Œ๋“œ ๋ณ€ํ™”๋ฅผ ์กฐ๊ธฐ์— ํฌ์ฐฉํ•จ
  • President์™€ LightGBM์œผ๋กœ ๊ตฌ์ถ•๋œ ๋ชจ๋ธ์€ ๋ฏธ๋ž˜์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚จ
  • ๋‚ ์”จ ์ •๋ณด์™€ ์‹œ๊ฐ„ ์ •๋ณด๊ฐ€ ์˜ˆ์ธก์— ๋งค์šฐ ์œ ์šฉํ•œ ๊ฒƒ์œผ๋กœ ๋ฐํ˜€์ง

 

728x90
๋ฐ˜์‘ํ˜•
Comments