๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

[Keras] Timeseries anomaly detection using an Autoencoder ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/ML & DL

[Keras] Timeseries anomaly detection using an Autoencoder

์ง•์ง•์•ŒํŒŒ์นด 2022. 11. 23. 16:37
728x90
๋ฐ˜์‘ํ˜•

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” keras ์˜ Timeseries ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-)>

 https://keras.io/examples/timeseries/timeseries_anomaly_detection/#introduction

 

Keras documentation: Timeseries anomaly detection using an Autoencoder

Timeseries anomaly detection using an Autoencoder Author: pavithrasv Date created: 2020/05/31 Last modified: 2020/05/31 Description: Detect anomalies in a timeseries using an Autoencoder. View in Colab • GitHub source Introduction This script demonstrate

keras.io

 

 

 

โšก 1. Setup

import numpy as np
import pandas as pd
from tensorflow import keras
from keras import layers
from matplotlib import pyplot as plt

 

โšก 2. Load the data

df_daily_jumpsup_url = "./data/test.csv"
df_daily_jumpsup = pd.read_csv(
    df_daily_jumpsup_url, parse_dates=True
)

print(df_daily_jumpsup.head())

df_daily_jumpsup = df_daily_jumpsup.set_index("insert_date_time")
df_daily_jumpsup.shape

 


โšก 3. Visualize the data

  • Timeseries data with anomalies
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
plt.show()

 

โšก 4. Prepare training data

training_mean = df_daily_jumpsup.mean(numeric_only=True)
training_std = df_daily_jumpsup.std(numeric_only=True)
df_training_value = (df_daily_jumpsup - training_mean) / training_std
print("Number of training samples:", len(df_training_value))

 

  • Create sequences
TIME_STEPS = 288

# Generated training sequences for use in the model.
def create_sequences(values, time_steps=TIME_STEPS):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output)


x_train = create_sequences(df_training_value.values)
print("Training input shape: ", x_train.shape)

 

 

โšก 5. Build a model

  • input of shape (batch_size, sequence_length, num_features)
model = keras.Sequential(
    [
        layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
        layers.Conv1D(
            filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Dropout(rate=0.2),
        layers.Conv1D(
            filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Conv1DTranspose(
            filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Dropout(rate=0.2),
        layers.Conv1DTranspose(
            filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Conv1DTranspose(filters=1, kernel_size=7, padding="same"),
    ]
)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="mse")
model.summary()

 

 

 

โšก 6. Train the model

history = model.fit(
    x_train,
    x_train,
    epochs=50,
    batch_size=128,
    validation_split=0.1,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, mode="min")
    ],
)

plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.show()

 

 

 

โšก 7. Detecting anomalies

  • ํ›ˆ๋ จ ์ƒ˜ํ”Œ์—์„œ MAE ์†์‹ค์„ ์ฐพ๊ธฐ
  • ์ตœ๋Œ€ MAE ์†์‹ค ๊ฐ’์„ ์ฐพ์Œ
    • ๋ชจ๋ธ์ด ์ƒ˜ํ”Œ์„ ์žฌ๊ตฌ์„ฑํ•˜๋ ค๊ณ  ์‹œ๋„ํ•œ ๊ฒƒ ์ค‘ ์ตœ์•…
    • ์ด๊ฒƒ์„ ์ด์ƒ ํƒ์ง€๋ฅผ ์œ„ํ•œ ์ž„๊ณ„๊ฐ’์œผ๋กœ ๋งŒ๋“ค ๊ฒƒ์ž„
  • ์ƒ˜ํ”Œ์˜ ์žฌ๊ตฌ์„ฑ ์†์‹ค์ด ์ด ์ž„๊ณ„๊ฐ’๋ณด๋‹ค ํฌ๋ฉด ๋ชจ๋ธ์ด ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ํŒจํ„ด์„ ๋ณด๊ณ  ์žˆ๋‹ค๊ณ  ์ถ”๋ก  ํ•  ์ˆ˜ ์žˆ์Œ
    • ์ด ์ƒ˜ํ”Œ์„ ์ด์ƒ ํ•ญ๋ชฉ์œผ๋กœ ํ‘œ์‹œ
# Get train MAE loss.
x_train_pred = model.predict(x_train)
train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1)

plt.hist(train_mae_loss, bins=50)
plt.xlabel("Train MAE loss")
plt.ylabel("No of samples")
plt.show()

# Get reconstruction loss threshold.
threshold = np.max(train_mae_loss)
print("Reconstruction error threshold: ", threshold)

 

  • Compare recontruction
# Checking how the first sequence is learnt
plt.plot(x_train[0])
plt.plot(x_train_pred[0])
plt.show()

 

 

 

โšก 8. Prepare test data

df_test_value = (df_daily_jumpsup - training_mean) / training_std
fig, ax = plt.subplots()
df_test_value.plot(legend=False, ax=ax)
plt.show()

 

# Create sequences from test values.
x_test = create_sequences(df_test_value.values)
print("Test input shape: ", x_test.shape)

 

# Get test MAE loss.
x_test_pred = model.predict(x_test)
test_mae_loss = np.mean(np.abs(x_test_pred - x_test), axis=1)
test_mae_loss = test_mae_loss.reshape((-1))
test_mae_loss

 

 

plt.hist(test_mae_loss, bins=50)
plt.xlabel("test MAE loss")
plt.ylabel("No of samples")
plt.show()

 

# Detect all the samples which are anomalies.
anomalies = test_mae_loss > threshold
print("Number of anomaly samples: ", np.sum(anomalies))
print("Indices of anomaly samples: ", np.where(anomalies))

 

 

โšก 9. Plot anomalies

  • ๋‚œ.. ๋น„์ •์ƒ๋งŒ ์žˆ์–ด์„œ ์ •์ƒ์นœ๊ตฌ๋ž‘ ๋น„์ •์ƒ์นœ๊ตฌ๋ž‘ ๋น„๊ตํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹จ์œผ ใ… ใ… 
# data i is an anomaly if samples [(i - timesteps + 1) to (i)] are anomalies
anomalous_data_indices = []
for data_idx in range(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
    if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
        anomalous_data_indices.append(data_idx)
df_subset = df_daily_jumpsup.iloc[anomalous_data_indices]
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
df_subset.plot(legend=False, ax=ax, color="r")
plt.show()

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments