๐ ๊ณต๋ถํ๋ ์ง์ง์ํ์นด๋ ์ฒ์์ด์ง?
[Keras] Timeseries anomaly detection using an Autoencoder ๋ณธ๋ฌธ
๐ฉ๐ป ์ธ๊ณต์ง๋ฅ (ML & DL)/ML & DL
[Keras] Timeseries anomaly detection using an Autoencoder
์ง์ง์ํ์นด 2022. 11. 23. 16:37728x90
๋ฐ์ํ
<๋ณธ ๋ธ๋ก๊ทธ๋ keras ์ Timeseries ๋ธ๋ก๊ทธ๋ฅผ ์ฐธ๊ณ ํด์ ๊ณต๋ถํ๋ฉฐ ์์ฑํ์์ต๋๋ค :-)>
https://keras.io/examples/timeseries/timeseries_anomaly_detection/#introduction
โก 1. Setup
import numpy as np
import pandas as pd
from tensorflow import keras
from keras import layers
from matplotlib import pyplot as plt
โก 2. Load the data
df_daily_jumpsup_url = "./data/test.csv"
df_daily_jumpsup = pd.read_csv(
df_daily_jumpsup_url, parse_dates=True
)
print(df_daily_jumpsup.head())
df_daily_jumpsup = df_daily_jumpsup.set_index("insert_date_time")
df_daily_jumpsup.shape
โก 3. Visualize the data
- Timeseries data with anomalies
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
plt.show()
โก 4. Prepare training data
training_mean = df_daily_jumpsup.mean(numeric_only=True)
training_std = df_daily_jumpsup.std(numeric_only=True)
df_training_value = (df_daily_jumpsup - training_mean) / training_std
print("Number of training samples:", len(df_training_value))
- Create sequences
TIME_STEPS = 288
# Generated training sequences for use in the model.
def create_sequences(values, time_steps=TIME_STEPS):
output = []
for i in range(len(values) - time_steps + 1):
output.append(values[i : (i + time_steps)])
return np.stack(output)
x_train = create_sequences(df_training_value.values)
print("Training input shape: ", x_train.shape)
โก 5. Build a model
- input of shape (batch_size, sequence_length, num_features)
model = keras.Sequential(
[
layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
layers.Conv1D(
filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
),
layers.Dropout(rate=0.2),
layers.Conv1D(
filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
),
layers.Conv1DTranspose(
filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
),
layers.Dropout(rate=0.2),
layers.Conv1DTranspose(
filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
),
layers.Conv1DTranspose(filters=1, kernel_size=7, padding="same"),
]
)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="mse")
model.summary()
โก 6. Train the model
history = model.fit(
x_train,
x_train,
epochs=50,
batch_size=128,
validation_split=0.1,
callbacks=[
keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, mode="min")
],
)
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.show()
โก 7. Detecting anomalies
- ํ๋ จ ์ํ์์ MAE ์์ค์ ์ฐพ๊ธฐ
- ์ต๋ MAE ์์ค ๊ฐ์ ์ฐพ์
- ๋ชจ๋ธ์ด ์ํ์ ์ฌ๊ตฌ์ฑํ๋ ค๊ณ ์๋ํ ๊ฒ ์ค ์ต์
- ์ด๊ฒ์ ์ด์ ํ์ง๋ฅผ ์ํ ์๊ณ๊ฐ์ผ๋ก ๋ง๋ค ๊ฒ์
- ์ํ์ ์ฌ๊ตฌ์ฑ ์์ค์ด ์ด ์๊ณ๊ฐ๋ณด๋ค ํฌ๋ฉด ๋ชจ๋ธ์ด ์ต์ํ์ง ์์ ํจํด์ ๋ณด๊ณ ์๋ค๊ณ ์ถ๋ก ํ ์ ์์
- ์ด ์ํ์ ์ด์ ํญ๋ชฉ์ผ๋ก ํ์
# Get train MAE loss.
x_train_pred = model.predict(x_train)
train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1)
plt.hist(train_mae_loss, bins=50)
plt.xlabel("Train MAE loss")
plt.ylabel("No of samples")
plt.show()
# Get reconstruction loss threshold.
threshold = np.max(train_mae_loss)
print("Reconstruction error threshold: ", threshold)
- Compare recontruction
# Checking how the first sequence is learnt
plt.plot(x_train[0])
plt.plot(x_train_pred[0])
plt.show()
โก 8. Prepare test data
df_test_value = (df_daily_jumpsup - training_mean) / training_std
fig, ax = plt.subplots()
df_test_value.plot(legend=False, ax=ax)
plt.show()
# Create sequences from test values.
x_test = create_sequences(df_test_value.values)
print("Test input shape: ", x_test.shape)
# Get test MAE loss.
x_test_pred = model.predict(x_test)
test_mae_loss = np.mean(np.abs(x_test_pred - x_test), axis=1)
test_mae_loss = test_mae_loss.reshape((-1))
test_mae_loss
plt.hist(test_mae_loss, bins=50)
plt.xlabel("test MAE loss")
plt.ylabel("No of samples")
plt.show()
# Detect all the samples which are anomalies.
anomalies = test_mae_loss > threshold
print("Number of anomaly samples: ", np.sum(anomalies))
print("Indices of anomaly samples: ", np.where(anomalies))
โก 9. Plot anomalies
- ๋.. ๋น์ ์๋ง ์์ด์ ์ ์์น๊ตฌ๋ ๋น์ ์์น๊ตฌ๋ ๋น๊ตํ๊ธฐ๊ฐ ์ด๋ ต๋จ์ผ ใ ใ
# data i is an anomaly if samples [(i - timesteps + 1) to (i)] are anomalies
anomalous_data_indices = []
for data_idx in range(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
anomalous_data_indices.append(data_idx)
df_subset = df_daily_jumpsup.iloc[anomalous_data_indices]
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
df_subset.plot(legend=False, ax=ax, color="r")
plt.show()
728x90
๋ฐ์ํ
'๐ฉโ๐ป ์ธ๊ณต์ง๋ฅ (ML & DL) > ML & DL' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
Comments