๐ ๊ณต๋ถํ๋ ์ง์ง์ํ์นด๋ ์ฒ์์ด์ง?
[DACON] HAICon2020 ์ฐ์ ์ ์ด์์คํ ๋ณด์์ํ ํ์ง AI & ๋น์ง๋ ๊ธฐ๋ฐ Autoencoder ๋ณธ๋ฌธ
[DACON] HAICon2020 ์ฐ์ ์ ์ด์์คํ ๋ณด์์ํ ํ์ง AI & ๋น์ง๋ ๊ธฐ๋ฐ Autoencoder
์ง์ง์ํ์นด 2022. 9. 8. 13:14220908 ์์ฑ
<๋ณธ ๋ธ๋ก๊ทธ๋ dacon ๋ํ์์์ ๋ฐ์ดํฌ๋ฃจ 2๊ธฐ Team Zoo ํ ์ฝ๋์ dacon์ HAI 2.0 Baseline ๊ธ์ ์ฐธ๊ณ ํด์ ๊ณต๋ถํ๋ฉฐ ์์ฑํ์์ต๋๋ค :-) >
https://dacon.io/codeshare/5141?dtype=recent
[Team Zoo] ํน๋ณํธ 4. ๋น์ง๋ํ์ต ๊ธฐ๋ฐ์ ์ด์ํ์ง ํ์ฉ(feat. ์๊ณ์ด)
dacon.io
https://dacon.io/competitions/official/235624/codeshare/1570?page=1&dtype=recent
HAI 2.0 Baseline
HAICon2020 ์ฐ์ ์ ์ด์์คํ ๋ณด์์ํ ํ์ง AI ๊ฒฝ์ง๋ํ
dacon.io
1๏ธโฃ Libraries & Data Load
!pip install /ํ์ผ๊ฒฝ๋ก/eTaPR-1.12-py3-none-any.whl
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from tqdm.notebook import trange
from TaPR_pkg import etapr
from pathlib import Path
import time
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import EarlyStopping
2๏ธโฃ Data Preprocessing
1) Data load
TRAIN_DATASET = sorted([x for x in Path("/home/ubuntu/coding/220906/HAI 2.0/training").glob("*.csv")])
TRAIN_DATASET
TEST_DATASET = sorted([x for x in Path("/home/ubuntu/coding/220906/HAI 2.0/testing").glob("*.csv")])
TEST_DATASET
VALIDATION_DATASET = sorted([x for x in Path("/home/ubuntu/coding/220906/HAI 2.0/validation").glob("*.csv")])
VALIDATION_DATASET
def dataframe_from_csv(target):
return pd.read_csv(target, engine='python').rename(columns=lambda x: x.strip())
def dataframe_from_csvs(targets):
return pd.concat([dataframe_from_csv(x) for x in targets])
TRAIN_DF_RAW = dataframe_from_csvs(TRAIN_DATASET)
TRAIN_DF_RAW = TRAIN_DF_RAW[:30720]
TRAIN_DF_RAW
2) Variables setting
- ์ ์ฒด ๋ฐ์ดํฐ๋ฅผ ๋์์ผ๋ก ์ด์์ ํ์งํ๋ฏ๋ก "attack" ํ๋๋ง ์ฌ์ฉ
- VALID_COLUMNS_IN_TRAIN_DATASET์ ํ์ต ๋ฐ์ดํฐ์ ์ ์๋ ๋ชจ๋ ์ผ์/์ก์ถ์์ดํฐ ํ๋
- ํ์ต ์ ๋ณด์ง ๋ชปํ๋ ํ๋์ ๋ํด์ ํ ์คํธ๋ฅผ ํ ์ ์์ผ๋ฏ๋ก ํ์ต ๋ฐ์ดํฐ์ ์ ๊ธฐ์ค์ผ๋ก ํ๋ ์ด๋ฆ
TIMESTAMP_FIELD = "time"
IDSTAMP_FIELD = 'id'
ATTACK_FIELD = "attack"
VALID_COLUMNS_IN_TRAIN_DATASET = TRAIN_DF_RAW.columns.drop([TIMESTAMP_FIELD])
VALID_COLUMNS_IN_TRAIN_DATASET
3) Data Normalization
- Min-Max Normalization (์ต์-์ต๋ ์ ๊ทํ)
- (X - MIN) / (MAX-MIN)
- ๋ชจ๋ feature์ ๋ํด ๊ฐ๊ฐ์ ์ต์๊ฐ 0, ์ต๋๊ฐ 1๋ก, ๊ทธ๋ฆฌ๊ณ ๋ค๋ฅธ ๊ฐ๋ค์ 0๊ณผ 1 ์ฌ์ด์ ๊ฐ์ผ๋ก ๋ณํ
TAG_MIN = TRAIN_DF_RAW[VALID_COLUMNS_IN_TRAIN_DATASET].min()
TAG_MAX = TRAIN_DF_RAW[VALID_COLUMNS_IN_TRAIN_DATASET].max()
def normalize(df):
ndf = df.copy()
for c in df.columns:
if TAG_MIN[c] == TAG_MAX[c]:
ndf[c] = df[c] - TAG_MIN[c]
else:
ndf[c] = (df[c] - TAG_MIN[c]) / (TAG_MAX[c] - TAG_MIN[c])
return ndf
TRAIN_DF = normalize(TRAIN_DF_RAW[VALID_COLUMNS_IN_TRAIN_DATASET])
- Pandas Dataframe์ ์๋ ๊ฐ ์ค 1 ์ด๊ณผ์ ๊ฐ์ด ์๋์ง, 0 ๋ฏธ๋ง์ ๊ฐ์ด ์๋์ง, NaN์ด ์๋์ง ์ ๊ฒ
- np.any( ) : ๋ฐฐ์ด์ ๋ฐ์ดํฐ ์ค ์กฐ๊ฑด๊ณผ ๋ง๋ ๋ฐ์ดํฐ๊ฐ ์์ผ๋ฉด True, ์ ํ ์์ผ๋ฉด False
def boundary_check(df):
x = np.array(df, dtype=np.float32)
print(x)
return np.any(x > 1.0), np.any(x < 0), np.any(np.isnan(x))
boundary_check(TRAIN_DF)
3๏ธโฃ Model
- Autoencoder
- ์ฃผ๋ก ์ด๋ฏธ์ง์ ์์ฑ์ด๋ ๋ณต์์ ๋ง์ด ์ฌ์ฉ
- ์ ์์ ์ธ ์ด๋ฏธ์ง๋ก ๋ชจ๋ธ ํ์ต ํ ๋น์ ์์ ์ธ ์ด๋ฏธ์ง๋ฅผ ๋ฃ์ด ์ด๋ฅผ ๋์ฝ๋ฉ ํ๊ฒ ๋๋ฉด ์ ์ ์ด๋ฏธ์ง ํน์ฑ๊ณผ ๋์ฝ๋ฉ ๋ ์ด๋ฏธ์ง ๊ฐ์ ์ฐจ์ด์ธ ์ฌ๊ตฌ์ฑ ์์ค(Reconstruction Error)๋ฅผ ๊ณ์ฐ
- ์ฌ๊ตฌ์ฑ ์์ค์ด ๋ฎ์ ๋ถ๋ถ์ ์ ์(normal), ์ฌ๊ตฌ์ฑ ์์ค์ด ๋์ ๋ถ๋ถ์ ์ด์(Abnormal)๋ก ํ๋จ
- Autoencoder์ ๋ ์ด์ด๋ฅผ LSTM์ผ๋ก ๊ตฌ์ฑํ์ฌ ์ํธ์ค ํ์ต์ด ๊ฐ๋ฅ
- !D-Convolution layer๋ฅผ ์ ์ฉํ์ฌ timestamp์ feature ์ ๋ณด๋ฅผ ์ธ๋ฐํ๊ฒ ์ด๋ํ๋ฉด์ ํ์ต
- Encoder-Decoder LSTM (=seq2seq)
- input๋ sequencial ๋ฐ์ดํฐ, output๋ sequencial ๋ฐ์ดํฐ
- (๋ฌธ์ ) input๊ณผ output์ sequence ๊ธธ์ด๊ฐ ๋ค๋ฅผ ์ ์์
- (ํด๊ฒฐ) Encoding : ์ฌ๋ฌ ๊ธธ์ด์ input์ ๊ณ ์ ๊ธธ์ด ๋ฒกํฐ๋ก ๋ณํ
- Encoder-Decoder LSTM ๋ชจ๋ธ์ ๋ค์ํ ๊ธธ์ด์ ์๊ณ์ด ์ ๋ ฅ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์, ๋ค์ํ ๊ธธ์ด์ ์๊ณ์ด ์ถ๋ ฅ ๋ฐ์ดํฐ๋ฅผ ๋ง๋ค ์ ์์
- LSTM Autoencoder๋ ๋ค์ํ ๊ธธ์ด์ ์๊ณ์ด input ๋ฐ์ดํฐ๋ฅผ ๊ณ ์ ๊ธธ์ด ๋ฒกํฐ๋ก ์์ถํด Decoder์ ์ ๋ ฅ์ผ๋ก ์ ๋ฌํด์ค
- ์ ๋ ฅ ๋ฐ์ดํฐ๋ฅผ encoded feature vector๋ก ๋ณํํ๋ ๊ณผ์ ์ด ์์
def temporalize(X, y, timesteps):
output_X = []
output_y = []
for i in range(len(X) - timesteps - 1):
t = []
for j in range(1, timesteps + 1):
t.append(X[[(i + j + 1)], :])
output_X.append(t)
output_y.append(y[i + timesteps + 1])
return np.squeeze(np.array(output_X)), np.array(output_y)
train = np.array(TRAIN_DF)
x_train = train.reshape(train.shape[0], 1, train.shape[1])
x_train.shape
โถ Model details
- Conv1D
- filters : ์ปจ๋ณผ๋ฃจ์ ์ฐ์ฐ์ output ์ถ๋ ฅ ์
- kernel_size : timestamp๋ฅผ ์ผ๋ง๋งํผ ๋ณผ ๊ฒ์ธ๊ฐ( = window_size)
- padding : ํ ์ชฝ ๋ฐฉํฅ์ผ๋ก ์ผ๋ง๋งํผ paddingํ ๊ฒ์ธ๊ฐ
- dilation : kernel ๋ด๋ถ์์ ์ผ๋ง๋งํผ์ ๊ฐ๊ฒฉ์ผ๋ก kernel์ ์ ์ฉํ ๊ฒ์ธ๊ฐ
- stride : default = 1, ์ปจ๋ณผ๋ฃจ์ ๋ ์ด์ด์ ์ด๋ํฌ๊ธฐ
data:image/s3,"s3://crabby-images/5a4d9/5a4d9be0ca40eeb5b1c7a4e6360fc28679752d6c" alt=""
data:image/s3,"s3://crabby-images/89e56/89e56a277caff499a1ddefc659e88e013b90ae58" alt=""
- LSTM
- unit : ์ถ๋ ฅ ์ฐจ์์ธต๋ง ์ค์
- ๋ชจ๋ธ์ ๊ตฌ์กฐ
- Conv1D - Dense์ธต - LSTM - Dense์ธต์ผ๋ก encoder ์ decoder๊ฐ ๋์นญ์ด ๋๋๋ก ์ค๊ณ
- ํ๋ผ๋ฏธํฐ๋ ์ฃผ๋ก filters, kernel_size, Dense, LSTM์ units ๊ฐ์ ์กฐ์
- Conv1D ๋ ์ด์ด๋ฅผ ์ถ๊ฐํ๊ฑฐ๋ maxpooling๊ณผ ๊ฐ์ด ๊ธฐ์กด์ CNN ๋ชจ๋ธ๊ณผ ๋์ผํ ๋ฐฉ์ ์ ์ฉ ๊ฐ๋ฅ
def conv_auto_model(x):
n_steps = x.shape[1]
n_features = x.shape[2]
keras.backend.clear_session()
model = keras.Sequential(
[
layers.Input(shape=(n_steps, n_features)),
layers.Conv1D(filters=512, kernel_size=64, padding='same', data_format='channels_last',
dilation_rate=1, activation="linear"),
layers.Dense(128),
layers.LSTM(
units=64, activation="relu", name="lstm_1", return_sequences=False
),
layers.Dense(64),
layers.RepeatVector(n_steps),
layers.Dense(64),
layers.LSTM(
units=64, activation="relu", name="lstm_2", return_sequences=True
),
layers.Dense(128),
layers.Conv1D(filters=512, kernel_size=64, padding='same', data_format='channels_last',
dilation_rate=1, activation="linear"),
layers.TimeDistributed(layers.Dense(x.shape[2], activation='linear'))
]
)
return model
model1 = conv_auto_model(x_train)
model1.compile(optimizer='adam', loss='mse')
model1.summary()
4๏ธโฃ Model fit
- epoch์ 3์ผ๋ก ํ๊ณ , earlystopping์ ์ฌ์ฉ
early_stopping = EarlyStopping(monitor='val_loss', patience=5)
epochs = 3
batch = 64
# fit
history = model1.fit(x_train, x_train,
epochs=epochs, batch_size=batch,
validation_split=0.2, callbacks=[early_stopping]).history
model1.save('model1.h5')
plt.plot(history['loss'], label='train loss')
plt.plot(history['val_loss'], label='valid loss')
plt.legend()
plt.xlabel('Epoch'); plt.ylabel('loss')
plt.show()
- model load
model1.save('best_AutoEncoder_model1.h5') #keras h5
model = load_model('best_AutoEncoder_model1.h5')
5๏ธโฃ Anomaly Detection
- ํ์ต๋ ๋ชจ๋ธ์ ๊ฒ์ฆ ๋ฐ์ดํฐ์ ์ ์ ์ฉ
VALIDATION_DF_RAW = dataframe_from_csvs(VALIDATION_DATASET)
VALIDATION_DF_RAW.to_csv('VALIDATION_DF_RAW.csv')
VALIDATION_DF_RAW
- ๊ฒ์ฆ ๋ฐ์ดํฐ ์ ์์๋ ์ ์ ๋ฐ์ดํฐ๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ๊ทํ๋ฅผ ์งํ
VALIDATION_DF = normalize(VALIDATION_DF_RAW[VALID_COLUMNS_IN_TRAIN_DATASET])
boundary_check(VALIDATION_DF)
- ์๊ฐํ๋ก ์ผ์ ๊ตฌ๊ฐ์์ 0๊ณผ 1 ๋ฒ์๋ฅผ ๋ฒ์ด๋๋ ๊ฒ์ ํ์ธ
VALIDATION_DF.plot()
VALIDATION_DF['C75'].plot()
6๏ธโฃ Data cleaning
- validation set์์ ์กฐ๊ธ ๋ ์ ๊ตํ๊ฒ threshold ์กฐ์ ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ํ์ธํ๊ธฐ ์ํด์ ํด๋น ๋ณ์์ ๊ฐ์ ์ ์ ๋ฒ์์ ๋ง๊ฒ ์์๋ก ์กฐ์
# valid ๊ทธ๋ํ๋ฅผ ๋ณด๊ณ ์๋ถ๋ถ ์ ์์ธ๋ฐ ๊ฐ์ด ํ๋ ๋ณ์๊ฐ ์์ด์ ์กฐ์
VALIDATION_DF['C75'][:2110] = 0.95
val = np.array(VALIDATION_DF)
x_val = val.reshape(val.shape[0], 1, val.shape[1])
x_val.shape
- ๋ชจ๋ธ์ ๊ฒฐ๊ณผ๊ฐ 3์ฐจ์์ ํํ์ด๊ธฐ ๋๋ฌธ์ ๋ณต์๋ ๊ฒฐ๊ณผ์์ ์ฐจ์ด๋ฅผ ํ์ธํ๊ธฐ ์ํด์๋ 2์ฐจ์์ผ๋ก ๋ค์ ๋ฐ๊พธ๊ธฐ
def flatten(X):
flattened_X = np.empty((X.shape[0], X.shape[2])) # sample x features array.
for i in range(X.shape[0]):
flattened_X[i] = X[i, (X.shape[1]-1), :]
return(flattened_X)
def scale(X, scaler):
for i in range(X.shape[0]):
X[i, :, :] = scaler.transform(X[i, :, :])
return X
-
๋ชจ๋ธ์ ์ํด ์ฌ๊ตฌ์ฑ๋ ๊ฐ์ ์ค์ ๊ฐ๊ณผ ์ฐจ์ด๋ฅผ ๊ตฌํด์ ์ฌ๊ตฌ์ฑ ์์ค(reconstruction error) ๊ฐ์ ๊ตฌํ๊ธฐ
-
์ ์์ธ ๊ฒฝ์ฐ ๋ชจ๋ธ์ด ์ ํ์ต๋์ด ๋ณต์์ด ์ ๋์๊ธฐ ๋๋ฌธ์ reconstruction error ๊ฐ์ด ์๊ฒ ๋์ฌ ๊ฒ
-
๊ณต๊ฒฉ์ธ ๊ฒฝ์ฐ ์ ๊ทํ๋ ๊ฐ์์ 0๊ณผ 1์ ๋ฒ์ด๋๊ธฐ ๋๋ฌธ์ reconstruction error ๊ฐ์ด ํฌ๊ฒ ๋์ฌ ๊ฒ
-
start = time.time()
valid_x_predictions = model.predict(x_val)
print(valid_x_predictions.shape)
error = flatten(x_val) - flatten(valid_x_predictions)
print((flatten(x_val) - flatten(valid_x_predictions)).shape)
valid_mse = np.mean(np.power(flatten(x_val) - flatten(valid_x_predictions), 2), axis=1)
print(valid_mse.shape)
print(time.time()-start)
7๏ธโฃ Precision Recall Curve
-
threshold์ ๊ฒฝ์ฐ Recall๊ณผ Precision์ ๊ฐ์ด ๊ต์ฐจ๋๋ ์ง์ ์ ๊ธฐ์ค
error_df = pd.DataFrame({'Reconstruction_error': valid_mse, 'True_class':list(VALIDATION_DF_RAW['attack'])})
precision_rt, recall_rt, threshold_rt = metrics.precision_recall_curve(error_df['True_class'], error_df['Reconstruction_error'])
plt.figure(figsize=(8,5))
plt.plot(threshold_rt, precision_rt[1:], label='Precision')
plt.plot(threshold_rt, recall_rt[1:], label='Recall')
plt.xlabel('Threshold'); plt.ylabel('Precision/Recall')
plt.legend()
#plt.show()
index_cnt = [cnt for cnt, (p, r) in enumerate(zip(precision_rt, recall_rt)) if p==r][0]
print('precision: ',precision_rt[index_cnt],', recall: ',recall_rt[index_cnt])
# fixed Threshold
threshold_fixed = threshold_rt[index_cnt]
print('threshold: ',threshold_fixed)
8๏ธโฃ Predict Validation Data set
- reconstruction error ๊ฐ์ด ์๊ฒ ๋์๊ณ , ๋น์ ์์ธ ๊ตฌ๊ฐ์ ํ์คํ๊ฒ reconstruction error ๊ฐ์ด ๋๊ฒ ๋์ค๋ ๊ฒ์ ํ์ธ
error_df = pd.DataFrame({'Reconstruction_error': valid_mse ,
'True_class': list(VALIDATION_DF_RAW['attack'])})
groups = error_df.groupby('True_class')
fig, ax = plt.subplots(figsize=(20,20))
for name, group in groups:
ax.plot(group.index, group.Reconstruction_error, marker='o', ms=3.5, linestyle='',
label= "Break" if name == 1 else "Normal")
ax.hlines(threshold_fixed, ax.get_xlim()[0], ax.get_xlim()[1], colors="r", zorder=100, label='Threshold')
ax.legend()
plt.title("Reconstruction error for different classes")
plt.ylabel("Reconstruction error")
plt.xlabel("Data point index")
9๏ธโฃ ์ด๋ํ๊ท
- ์ด๋ํ๊ท ๊ฐ์ ํตํด ์ ์์ธ ๊ตฌ๊ฐ์ ํ๊ท ์ ์ผ๋ก ๋ ๋ฎ๊ฒ ํ๊ณ , ๋น์ ์์ธ ๊ตฌ๊ฐ์ ํ๊ท ์ ์ผ๋ก ๋ ๋์ ๊ฐ์ ๋ํ๋ด๋๋ก ํจ
error_df
#์ด๋ํ๊ท
mean_window = error_df['Reconstruction_error'].rolling(50).mean()
window_error = mean_window.fillna(0)
window_error
- ํ์คํ๊ฒ ๊ณต๊ฒฉ์ธ ๊ตฌ๊ฐ ์ก๊ธฐ
window_error_df = pd.DataFrame({'Reconstruction_error': window_error ,
'True_class': list(VALIDATION_DF_RAW['attack'])})
groups = window_error_df.groupby('True_class')
fig, ax = plt.subplots(figsize=(20,20))
for name, group in groups:
ax.plot(group.index, group.Reconstruction_error, marker='o', ms=3.5, linestyle='',
label= "Break" if name == 1 else "Normal")
ax.hlines(threshold_fixed, ax.get_xlim()[0], ax.get_xlim()[1], colors="r", zorder=100, label='Threshold')
ax.legend()
plt.title("Reconstruction error for different classes")
plt.ylabel("Reconstruction error")
plt.xlabel("Data point index")
- threshold ๊ฐ์ผ๋ก validation set ์ ๊ฒฐ๊ณผ๋ฅผ ํ์ธ
pred_y = [1 if e > threshold_fixed else 0 for e in window_error_df['Reconstruction_error'].values]
pred_y = np.array(pred_y)
pred_y.shape
๐ ํ๊ฐ
ATTACK_LABELS = np.array(VALIDATION_DF_RAW[ATTACK_FIELD])
FINAL_LABELS = np.array(pred_y)
ATTACK_LABELS.shape[0] == FINAL_LABELS.shape[0]
TaPR = etapr.evaluate(anomalies=ATTACK_LABELS, predictions=FINAL_LABELS)
print(f"F1: {TaPR['f1']:.3f} (TaP: {TaPR['TaP']:.3f}, TaR: {TaPR['TaR']:.3f})")
print(f"# of detected anomalies: {len(TaPR['Detected_Anomalies'])}")
print(f"Detected anomalies: {TaPR['Detected_Anomalies']}")
- ๋น์ ์ ๊ตฌ๊ฐ์์ ํน์ดํ๋ ๋ณ์์ ๋ฐ์ดํฐ ๊ฐ์ ์กฐ์ ํ๊ณ ์ด๋ ํ๊ท ๊ฐ์ ํ์ฉํ์์ ๋
- validation set์์ TaPR ์ ์๊ฐ 99.8์ด๋ผ๋ ๋์ ์ ์๊ฐ ๋์ด
1๏ธโฃ1๏ธโฃ Predict Test Data set
- ์์ ์ ์ฒ๋ฆฌ, ์ ๊ทํ, ๋ชจ๋ธ ๋์ผ
TEST_DF_RAW = dataframe_from_csvs(TEST_DATASET)
TEST_DF_RAW
TEST_DF = normalize(TEST_DF_RAW[VALID_COLUMNS_IN_TRAIN_DATASET]).ewm(alpha=0.9).mean()
TEST_DF
TEST_DF.plot()
boundary_check(TEST_DF)
test = np.array(TEST_DF)
x_test = test.reshape(test.shape[0], 1, test.shape[1])
x_test.shape
start = time.time()
test_x_predictions = model.predict(x_test)
print(test_x_predictions.shape)
test_mse = np.mean(np.power(flatten(x_test) - flatten(test_x_predictions), 2), axis=1)
print(test_mse.shape)
print(time.time()-start)
test_error = pd.DataFrame({'Reconstruction_error': test_mse})
- ํ ์คํธ ๋ฐ์ดํฐ ์ ์์๋ label ๊ฐ์ ์ ์ ์์๊ธฐ ๋๋ฌธ์, ์ด๋ํ๊ท ์ ๊ตฌ๊ฐ๊ณผ threshold ๊ฐ์ ์กฐ๊ธ์ฉ ๋ณ๊ฒฝํ๋ฉด์ ์ ์ถ ํ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ณ ์กฐ์
movemean_test = pd.DataFrame({'Reconstruction_error': test_d})
pred_y_test = [1 if e > 0.000425 else 0 for e in movemean_test['Reconstruction_error'].values]
pred_y_test = np.array(pred_y_test)
pred_y_test.shape
submission = pd.read_csv('HAI 2.0/sample_submission.csv')
submission.index = submission['time']
submission['attack'] = pred_y_test
submission['attack'].value_counts()
test_error_df = pd.DataFrame({'Reconstruction_error': test_d,
'True_class': list(submission['attack'])})
groups = test_error_df.groupby('True_class')
fig, ax = plt.subplots(figsize=(20,20))
for name, group in groups:
ax.plot(group.index, group.Reconstruction_error, marker='o', ms=3.5, linestyle='',
label= "Break" if name == 1 else "Normal")
ax.hlines(0.000425, ax.get_xlim()[0], ax.get_xlim()[1], colors="r", zorder=100, label='Threshold')
ax.legend()
plt.title("Reconstruction error for different classes")
plt.ylabel("Reconstruction error")
plt.xlabel("Data point index")
'๐ฉโ๐ป ์ธ๊ณต์ง๋ฅ (ML & DL) > Serial Data' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[DACON] ๋์๋ฐ์ ํ์๊ด ๋ฐ์ ๋ ์์ธก AI ๊ฒฝ์ง๋ํ (0) | 2022.09.14 |
---|---|
์๊ณ์ด ๋ชจ๋ธ ARIMA 2 (์๊ธฐํ๊ท ์ง์ ์ด๋ ํ๊ท ) (0) | 2022.09.08 |
[์ด์ ํ์ง] ML for Time Series & windows (0) | 2022.09.07 |
[DACON] HAICon2020 ์ฐ์ ์ ์ด์์คํ ๋ณด์์ํ ํ์ง AI & LSTM (0) | 2022.09.07 |
ADTK (Anomaly detection toolkit) ์๊ณ์ด ์ด์ํ์ง ์คํ์์ค (0) | 2022.09.06 |