π 곡λΆνλ μ§μ§μνμΉ΄λ μ²μμ΄μ§?
[Kaggle] Time-series data analysis using LSTM λ³Έλ¬Έ
[Kaggle] Time-series data analysis using LSTM
μ§μ§μνμΉ΄ 2022. 9. 20. 11:30220920 μμ±
<λ³Έ λΈλ‘κ·Έλkaggleμ AMIR REZAEIAN λμ codeμ notebook μ μ°Έκ³ ν΄μ 곡λΆνλ©° μμ±νμμ΅λλ€ :-) >
https://www.kaggle.com/code/amirrezaeian/time-series-data-analysis-using-lstm-tutorial/notebook
http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption#
Time-series data analysis using LSTM (Tutorial)
Explore and run machine learning code with Kaggle Notebooks | Using data from Household Electric Power Consumption
www.kaggle.com
π νλ‘μ νΈ μκ°
- κ°λ³ κ°μ μ λ ₯ μλΉ λ°μ΄ν° μΈνΈ
- λ°μ΄ν°μ λν΄ κ°μ₯ κ°λ¨ν LSTM(μ₯λ¨κΈ° κΈ°μ΅) μν μ κ²½λ§μ ꡬμΆνλ λ°©λ²
- 4λ λμ 1λΆ μνλ§ μλλ‘ ν κ°μ μ μ λ ₯ μλΉλ μΈ‘μ
- 2006λ 12μμμ 2010λ 11μ(47κ°μ) μ¬μ΄μ Sceaux(νλμ€ ν리μμ 7km)μ μμΉν μ§μμ μμ§ν 2075259κ°μ μΈ‘μ κ°μ΄ ν¬ν¨
π λ°μ΄ν° μ μ 보
- μ°Έκ³
- (global_active_power*1000/60 - sub_metering_1 - sub_metering_2 - sub_metering_3)μ 보쑰 κ³λ 1, 2 λ° 3μμ μΈ‘μ λμ§ μμ μ κΈ° μ₯λΉκ° κ°μ μμ 1λΆ(in watt hour)μμ μλΉνλ νμ± μλμ§
- λ°μ΄ν°μΈνΈμ μΈ‘μ κ°μ μΌλΆ λλ½λ κ°μ΄ ν¬ν¨ (νμ κ±°μ 1,25%)
- λͺ¨λ λ¬λ ₯ νμμ€ν¬νκ° λ°μ΄ν°μΈνΈμ μμ§λ§ μΌλΆ νμμ€ν¬νμ κ²½μ° μΈ‘μ κ°μ΄ λλ½
- λλ½λ κ°μ λ κ°μ μ°μ μΈλ―Έμ½λ‘ μμ± κ΅¬λΆ κΈ°νΈ μ¬μ΄μ κ°μ΄ μλ κ²μΌλ‘ λνλ
- μμ± μ 보
- date : dd/mm/yyyy νμμ λ μ§
- time : hh:mm:ss νμμ μκ°
- global_active_power : κ°μ μ© μ μΈκ³ λΆ νκ· μ ν¨ μ λ ₯(kilowatt)
- global_reactive_power : κ°μ μ© μ μΈκ³ λΆ νκ· λ¬΄ν¨ μ λ ₯ (λ¨μ: kilowatt)
- μ μ : λΆ νκ· μ μ(λ¨μ: volt)
- global_intensity : κ°μ μ© κΈλ‘λ² λΆ νκ· μ λ₯ κ°λ(λ¨μ: ampere)
- sub_metering_1 : μλμ§ λ³΄μ‘° κ³λ 1λ²(in watt-hour of active energy). μ£Όλ‘ μκΈ°μΈμ²κΈ°, μ€λΈ, μ μλ μΈμ§(ν« νλ μ΄νΈλ μ κΈ°κ° μλ κ°μ€)κ° μλ μ£Όλ°©μ ν΄λΉ
- sub_metering_2 : μλμ§ λ³΄μ‘° κ³λ 2λ²(in watt-hour of active energy). μΈνκΈ°, νμ μ 건쑰기, λμ₯κ³ , μ‘°λͺ μ΄ μλ μΈνμ€μ ν΄λΉ
- sub_metering_3 : μλμ§ λ³΄μ‘° κ³λ 3λ²(in watt-hour of active energy). μ κΈ° μ¨μκΈ° λ° μμ΄μ»¨μ ν΄λΉ
π μ½λ ꡬν
1οΈβ£ Package load
import sys
import numpy as np # linear algebra
from scipy.stats import randint
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv), data manipulation as in SQL
import matplotlib.pyplot as plt # this is used for the plot the graph
import seaborn as sns # used for plot interactive graph.
from sklearn.model_selection import train_test_split # to split the data into two parts
from sklearn.model_selection import KFold # use for cross validation
from sklearn.preprocessing import StandardScaler # for normalization
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline # pipeline making
from sklearn.model_selection import cross_val_score
from sklearn.feature_selection import SelectFromModel
from sklearn import metrics # for the check the error and accuracy of the model
from sklearn.metrics import mean_squared_error,r2_score
## for Deep-learing:
import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from tensorflow.keras.optimizers import SGD
from keras.callbacks import EarlyStopping
from keras.utils import np_utils
import itertools
from keras.layers import LSTM
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers import Dropout
2οΈβ£ Data load
df = pd.read_csv('household_power_consumption/household_power_consumption.txt', sep=';',
parse_dates={'dt' : ['Date', 'Time']}, infer_datetime_format=True,
low_memory=False, na_values=['nan','?'], index_col='dt')
- 1) λ°μ΄ν°μ λ¬Έμμ΄λ‘ 'nan'κ³Ό '?'μ΄ ν¬ν¨ -> λ κ°λ₯Ό numpy nanμΌλ‘ λ³ννμ¬ λμΌνκ² μ²λ¦¬
-
2) 'λ μ§'μ 'μκ°' λ μ΄μ 'dt'λ‘ λ³ν©
-
3) λ°μ΄ν°λ₯Ό μκ°μ΄ λλλ‘ μΈλ±μ€λ₯Ό κ°μ Έμμ μκ³μ΄ μ νμΌλ‘ λ³ν
- nan κ° μ²λ¦¬νκΈ°
droping_list_all=[]
for j in range(0,7):
if not df.iloc[:, j].notnull().all():
droping_list_all.append(j)
#print(df.iloc[:,j].unique())
droping_list_all
- mean κ°μΌλ‘ μ±μ°κΈ°
for j in range(0,7):
df.iloc[:,j]=df.iloc[:,j].fillna(df.iloc[:,j].mean())
3οΈβ£ Data visualization
- ν루 λμ μ¬μνλ§νκ³ Global_active_powerμ mean κ³Ό sum
- μ¬μνλ§λ λ°μ΄ν° μ§ν©μ mean κ³Ό sum μ μ μ¬ν ꡬ쑰λ₯Ό κ°λ κ²μΌλ‘ 보μ
df.Global_active_power.resample('D').sum().plot(title='Global_active_power resampled over day for sum')
plt.tight_layout()
plt.show()
df.Global_active_power.resample('D').mean().plot(title='Global_active_power resampled over day for mean', color='red')
plt.tight_layout()
plt.show()
# μλ κ°λ₯
# t = df.Global_active_power.resample('D').agg(['sum', 'mean'])
# t.plot(subplots = True, title='Global_active_power resampled over day')
# plt.show()
- 'Global_intensity'μ mean κ³Ό std κ° ν루 λμ μνλ§λ κ²
r = df.Global_intensity.resample('D').agg(['mean', 'std'])
r.plot(subplots = True, title='Global_intensity resampled over day')
plt.show()
- ν루 λμ μνλ§λ 'Global_reactive_power'μ mean λ° std
r2 = df.Global_reactive_power.resample('D').agg(['mean', 'std'])
r2.plot(subplots = True, title='Global_reactive_power resampled over day', color='purple')
plt.show()
- νλ¬ λμ μνλ§λ 'Global_active_power'μ sum
df['Global_active_power'].resample('M').mean().plot(kind='bar', label = "sum", color = "pink")
plt.xticks(rotation=60)
plt.ylabel('Global_active_power')
plt.title('Global_active_power per month (averaged over month)')
plt.legend()
plt.show()
- λΆκΈ°λ³λ‘ λ€μ μνλ§λ 'Global_active_power'μ mean
df['Global_active_power'].resample('Q').mean().plot(kind='bar', label = "mean", color = "royalblue")
plt.xticks(rotation=60)
plt.ylabel('Global_active_power')
plt.title('Global_active_power per quarter (averaged over quarter)')
plt.legend()
plt.show()
- μμ κ±Έμ³ μνλ§λ 'Voltage'μ mean
df['Voltage'].resample('M').mean().plot(kind='bar', label = "mean", color = "olive")
plt.xticks(rotation=60)
plt.ylabel('Voltage')
plt.title('Voltage per quarter (summed over quarter)')
plt.legend()
plt.show()
- μμ κ±Έμ³ μνλ§λ 'Sub_metering_1'μ mean
df['Sub_metering_1'].resample('M').mean().plot(kind='bar', label = "mean", color = "brown")
plt.xticks(rotation=60)
plt.ylabel('Sub_metering_1')
plt.title('Sub_metering_1 per quarter (summed over quarter)')
plt.legend()
plt.show()
πΌ μλ³ 'Voltage'μ mean μ΄ λ€λ₯Έ νΉμ§μ λΉν΄ κ±°μ μΌμ νλ€
- ν루 λμ μνλ§λ μ¬λ¬ κΈ°λ₯μ mean
cols = [0, 1, 2, 3, 5, 6]
i = 1
groups=cols
values = df.resample('D').mean().values
# plot each column
plt.figure(figsize=(15, 10))
for group in groups:
plt.subplot(len(cols), 1, i)
plt.plot(values[:, group])
plt.title(df.columns[group], y=0.75, loc='right')
i += 1
plt.show()
- μΌμ£ΌμΌ λμ μ¬μνλ§ λ° mean
df.Global_reactive_power.resample('W').mean().plot(color='y', legend=True)
df.Global_active_power.resample('W').mean().plot(color='r', legend=True)
df.Sub_metering_1.resample('W').mean().plot(color='b', legend=True)
df.Global_intensity.resample('W').mean().plot(color='g', legend=True)
plt.show()
- ν λ¬μ κ±Έμ³ μ¬μνλ§λ λ€λ₯Έ νΉμ§μ meanμ λν histogram
df.Global_reactive_power.resample('W').mean().plot(color='y', legend=True)
df.Global_active_power.resample('W').mean().plot(color='r', legend=True)
df.Sub_metering_1.resample('W').mean().plot(color='b', legend=True)
df.Global_intensity.resample('W').mean().plot(color='g', legend=True)
plt.show()
- ν λ¬μ κ±Έμ³ μ¬μνλ§λ λ€λ₯Έ νΉμ§μ meanμ λν histogram
df.Global_active_power.resample('M').mean().plot(kind='hist', color='r', legend=True )
df.Global_reactive_power.resample('M').mean().plot(kind='hist',color='b', legend=True)
df.Global_intensity.resample('M').mean().plot(kind='hist', color='g', legend=True)
df.Sub_metering_1.resample('M').mean().plot(kind='hist', color='y', legend=True)
plt.show()
- Global_intensity, Global_active_powerμ μκ΄κ΄κ³
- pct_change μ°¨μ΄[λ°±λΆμ¨]
- ν κ°μ²΄ λ΄μμ νκ³Ό νμ μ°¨μ΄λ₯Ό νμ¬κ°κ³Όμ λ°±λΆμ¨λ‘ μΆλ ₯νλ λ©μλ
- (λ€μν - νμ¬ν) ÷ νμ¬ν ==== (맀λκ°κ²© - 맀μκ°κ²©) % 맀μκ°κ²©
- νΉμ NμΌμ λν μμ΅λ₯ μ ꡬνκ³ μΆλ€λ©΄ pct_change(periods=N)μ μ λ ₯
- df.pct_change(periods=1, fill_method='pad', limit=None, freq=None, kwargs)
- periods : λΉκ΅ν κ°κ²©μ μ§μ (κΈ°λ³Έμ +1λ‘ λ°λ‘ μ΄μ κ°κ³Ό λΉκ΅)
- fill_method : {ffill : μμ κ°μΌλ‘ μ±μ / bfill : λ€μ κ°μΌλ‘ μ±μ} κ²°μΈ‘μΉλ₯Ό λ체ν κ°
- limit : κ²°μΈ‘κ°μ λͺκ°λ λ체ν μ§ μ ν¨
- freq : μκ³μ΄ APIμμ μ¬μ©ν μ¦λΆμ μ§μ
- pct_change μ°¨μ΄[λ°±λΆμ¨]
data_returns = df.pct_change()
# jointplot : scatter(μ°μ λ)μ histogram(λΆν¬)μ λμμ κ·Έλ €μ£Όλ©° μ«μν λ°μ΄ν°λ§ νν κ°λ₯
sns.jointplot(x='Global_intensity', y='Global_active_power', data=data_returns)
plt.show()
- Voltageμ Global_active_power μ¬μ΄μ μκ΄ κ΄κ³
sns.jointplot(x='Voltage', y='Global_active_power', data=data_returns)
plt.show()
μμ λ κ·Έλνμμ 'Global_incentity'μ 'Global_active_power'λ μκ΄κ΄κ³κ° μμμ μ μ μμ
'Voltage', 'Global_active_power'λ μκ΄ κ΄κ³κ° μ μ
μ°μ λλ
: λ λ³μμ κ΄κ³λ₯Ό 보μ¬μ£Όλ μλ£ νμ λ°©λ²
: κ° μΈ‘μ κ°μ λ λ³μλ₯Ό μλ―Ένλ (x, y)
- λ³μ xκ° μ¦κ°ν μλ‘ λ³μ yλ μ¦κ°ν λ, λ λ³μ μ¬μ΄μλ μμ μκ΄κ΄κ³κ° μλ€- λ³μ xκ° μ¦κ°ν μλ‘ λ³μ yλ κ°μν λ, λ λ³μ μ¬μ΄μλ μμ μκ΄κ΄κ³κ° μλ€
- λ λ³μ μ¬μ΄μ νΉλ³ν κ΄κ³κ° μλ€λ©΄, λ λ³μλ μ무 μ°κ΄μ±μ΄ μλ€
4οΈβ£ Correlations among features
- μ΄ κ°μ μκ΄ κ΄κ³
plt.matshow(df.corr(method='spearman'),vmax=1,vmin=-1,cmap='PRGn')
plt.title('without resampling', size=15)
plt.colorbar()
plt.show()

- λͺ κ°μ λμ μ¬μνλ§λ νΉμ§μ mean μκ΄ κ΄κ³
plt.matshow(df.resample('M').mean().corr(method='spearman'),vmax=1,vmin=-1,cmap='PRGn')
plt.title('resampled over month', size=15)
plt.colorbar()
plt.margins(0.02)
plt.matshow(df.resample('A').mean().corr(method='spearman'),vmax=1,vmin=-1,cmap='PRGn')
plt.title('resampled over year', size=15)
plt.colorbar()
plt.show()
μμμ 보면 리μνλ§ κΈ°μ λ‘ νΉμ§ κ°μ μκ΄κ΄κ³λ₯Ό λ³κ²½ν μ μμ
5οΈβ£ Machine-Leaning: LSTM
- μκ³μ΄κ³Ό μμ°¨μ λ¬Έμ μ κ°μ₯ μ ν©ν λ°λ³΅ μ κ²½λ§(LSTM)μ μ μ©
: ν° λ°μ΄ν°λ₯Ό κ°μ§κ³ μλ€λ©΄ μ΄ μ κ·Όλ²μ΄ μ΅μ
- μ§λ νμ΅ λ¬Έμ λ₯Ό Global_active_power μΈ‘μ λ° λ€λ₯Έ κΈ°λ₯μ΄ μ£Όμ΄μ§ νμ¬ μκ°(t)μμ Global_active_powerλ₯Ό μμΈ‘νλ κ²μΌλ‘ νλ μν κ²
- κ³μ° μκ°μ λ¨μΆνκ³ λͺ¨λΈμ ν μ€νΈν μ μλ λΉ λ₯Έ κ²°κ³Όλ₯Ό μ»κΈ° μν΄ μκ° λ¨μλ‘ λ°μ΄ν°λ₯Ό μ¬κ΅¬μ± (μλ λ°μ΄ν°λ λΆ λ¨μλ‘ μ 곡)
- λ°μ΄ν°μ ν¬κΈ°κ° 2075259μμ 34589λ‘ μ€μ΄λ€μ§λ§, λ°μ΄ν°μ μ 체μ μΈ κ΅¬μ‘°λ μ μ§λλ€.
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
dff = pd.DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(dff.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(dff.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = pd.concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
- κ³μ° μκ°μ λ¨μΆνκ³ λͺ¨λΈμ ν μ€νΈν μ μλ λΉ λ₯Έ κ²°κ³Όλ₯Ό μ»κΈ° μν΄ μκ° λ¨μλ‘ λ°μ΄ν°λ₯Ό μ¬κ΅¬μ± (μλ λ°μ΄ν°λ λΆ λ¨μλ‘ μ 곡)
- λ°μ΄ν°μ ν¬κΈ°κ° 2075259μμ 34589λ‘ μ€μ΄λ€μ§λ§, λ°μ΄ν°μ μ 체μ μΈ κ΅¬μ‘°λ μ μ§λλ€.
## resampling of data over hour
df_resample = df.resample('h').mean()
df_resample.shape
- [0,1] λ²μμ λͺ¨λ κΈ°λ₯μ νμ₯
- μ¬μνλ§λ λ°μ΄ν°(μκ° μ΄μ)λ₯Ό κΈ°λ°μΌλ‘ νλ ¨
values = df_resample.values
## full data without resampling
#values = df.values
# integer encode direction
# ensure all data is float
#values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[8,9,10,11,12,13]], axis=1, inplace=True)
print(reframed.head())
νμ¬ μκ°(μ¬μνλ§μ λ°λΌ λ€λ¦)μμ 7κ°μ μ λ ₯ λ³μ(μ λ ₯ μ리μ¦)μ 'Global_active_power'μ λν 1κ°μ μΆλ ₯ λ³μλ₯Ό 보μ
π Splitting the rest of data to train and validation sets
- μ€λΉλ λ°μ΄ν° μΈνΈλ₯Ό trainμ test setλ‘ λλ
- λͺ¨λΈμ κ΅μ‘ μλλ₯Ό λμ΄κΈ° μν΄ λ°μ΄ν° 첫ν΄μλ§ λͺ¨λΈμ train ν ν ν₯ν 3λ λμ λ°μ΄ν°λ₯Ό νκ°
# split into train and test sets
values = reframed.values
n_train_time = 365*24
train = values[:n_train_time, :]
test = values[n_train_time:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
- LSTMμ΄ μμν λλ‘ μ λ ₯μ 3D νμ, μ¦ [μν, μκ° λ¨κ³, νΉμ§]μΌλ‘ μ¬κ΅¬μ±
π Model architecture
- 1) 첫 λ²μ§Έ visible layer μ 100κ°μ λ΄λ°μ΄ μλ LSTM
- 2) 20%λ₯Ό dropout
- 3) Global_active_powerλ₯Ό μμΈ‘νκΈ° μν output layer μ λ΄λ° 1κ°
- 4) input shapeλ 7κ°μ featureλ‘ κ΅¬μ±λ 1ν time step
- 5) νκ· μ λ μ€μ°¨(MAE) μμ€ ν¨μμ νλ₯ μ κ²½μ¬ κ°νμ ν¨μ¨μ μΈ Adam λ²μ μ μ¬μ©
model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
- 6) λͺ¨λΈμ batch sizeκ° 70μΈ 20κ°μ training epoch μ μ ν©ν κ²
# fit network
history = model.fit(train_X, train_y, epochs=20, batch_size=70, validation_data=(test_X, test_y), verbose=2, shuffle=False)
- 7) Loss μκ°ν
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()
- 8) μμΈ‘νκΈ° + RMSE
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], 7))
# invert scaling for forecast
inv_yhat = np.concatenate((yhat, test_X[:, -6:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X[:, -6:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)
λͺ¨λΈμ κ°μ νλ €λ©΄ epochμ batch_sizeλ₯Ό μ‘°μ νκΈ°
- Time steps, λͺ¨λ stepλ 1μκ° (μκ° λ¨κ³λ₯Ό μ€μ μκ° μΈλ±μ€λ‘ μ½κ² λ³νν μ μμ)
- demo λͺ©μ μΌλ‘, 200μκ° μμ μμΈ‘μ λΉκ΅ λͺ©ν!
aa=[x for x in range(200)]
plt.plot(aa, inv_y[:200], marker='.', label="actual")
plt.plot(aa, inv_yhat[:200], 'r', label="prediction")
plt.ylabel('Global_active_power', size=15)
plt.xlabel('Time step', size=15)
plt.legend(fontsize=15)
plt.show()
6οΈβ£ Final
- μμ°¨μ λ¬Έμ μ λν μ΅μ κΈ°μ μΈ LSTM μ κ²½λ§μ μ¬μ©
- κ³μ° μκ°μ λ¨μΆνκ³ κ²°κ³Όλ₯Ό λΉ λ₯΄κ² μ»κΈ° μν΄ μ²« ν΄ λ°μ΄ν°(μκ°μ λ°λΌ λ€μ μνλ§)λ₯Ό μ¬μ©νμ¬ λͺ¨λΈμ κ΅μ‘νκ³ λλ¨Έμ§ λ°μ΄ν°λ₯Ό μ¬μ©νμ¬ λͺ¨λΈμ ν μ€νΈ
- ν©λ¦¬μ μΈ μμΈ‘μ μ»μ μ μλ€λ κ²μ 보μ¬μ£ΌκΈ° μν΄ λ§€μ° κ°λ¨ν LSTM μ κ²½λ§μ ꡬμ±
- BUT, νμ μκ° λ무 λ§κ³ κ²°κ³Όμ μΌλ‘ κ³μ°μ λ§€μ° μκ°μ΄ κ±Έλ¦Ό
- κ°μ₯ μ’μ κ²μ GPUμμ μ€νλλ μ€νν¬(MLlib)λ₯Ό μ¬μ©νμ¬ μ½λμ λ§μ§λ§ λΆλΆμ μμ±νλ κ²
- CNNμ λ°μ΄ν°μ μκ΄κ΄κ³κ° μκΈ° λλ¬Έμ μ¬κΈ°μ μ μ©νλ€(CNN κ³μΈ΅μ λ°μ΄ν°μ λ‘컬 ꡬ쑰λ₯Ό μ‘°μ¬νλ μ’μ λ°©λ²)