๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

tsod: Anomaly Detection for time series data ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/Serial Data

tsod: Anomaly Detection for time series data

์ง•์ง•์•ŒํŒŒ์นด 2022. 9. 15. 11:29
728x90
๋ฐ˜์‘ํ˜•

220915 ์ž‘์„ฑ

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” DHI/tsod ๋‹˜๊ณผ dhi ๋‹˜์˜ github๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://dhi.github.io/tsod/

 

tsod: Anomaly Detection for time series data. — tsod documentation

 

dhi.github.io

https://github.com/DHI/tsod

 

GitHub - DHI/tsod: Anomaly Detection for time series data

Anomaly Detection for time series data. Contribute to DHI/tsod development by creating an account on GitHub.

github.com

 

๐Ÿ˜Ž tsod ๋ž€?

  • ์ด์ƒ ํ˜„์ƒ์€ ๊ฒฝ๊ณ„ ์กฐ๊ฑด ๋˜๋Š” ์‹ค์‹œ๊ฐ„ ๊ฒฐ์ • ์‹œ์Šคํ…œ์œผ๋กœ ์ˆ˜์น˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์—”์ง„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์ „์— ์ž๋™์œผ๋กœ ๊ฐ์ง€๋˜๊ณ  ๋ณด๋‹ค ์‹คํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฐ’์œผ๋กœ ๋Œ€์ฒด๋˜์–ด์•ผ ํ•จ
  • ์‹œ๊ณ„์—ด์˜ ์ด์ƒ ๊ฐ์ง€๋ฅผ ์œ„ํ•œ ๊ฐ„๋‹จํ•˜๊ณ  ์ผ๊ด€๋œ API
  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
    • ์‹œ๊ณ„์—ด ํ˜•์‹์€ ํ•ญ์ƒ Series
    • ์–ด๋–ค ๊ฒฝ์šฐ์—๋Š” DatetimeIndex
  • ๋‘๊ฐ€์ง€ ์œ ํ˜• ๊ฐ์ง€
    • Outlier detection (unsupervised anomaly detection)
      • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋Š” ์ด์ƒ์น˜, ์ฆ‰ ๋Œ€๋ถ€๋ถ„์˜ ๋‹ค๋ฅธ ๊ด€์ฐฐ์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๊ด€์ฐฐ์„ ํฌํ•จ
      • ์ด์ƒ๊ฐ’ ๊ฐ์ง€๊ธฐ๋Š” ์œ ์‚ฌํ•˜๊ณ  ์„œ๋กœ ๊ฐ€๊นŒ์šด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๊ด€์ธก๊ฐ’์— ์ง‘์ค‘ํ•˜๋ ค๊ณ  ํ•˜๊ณ  ๋” ๋ฉ€๋ฆฌ ์žˆ๋Š” ๊ด€์ธก๊ฐ’์€ ๋ฌด์‹œ
    • Novelty detection (semi-supervised anomaly detection)
      • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋Š” "์ •์ƒ"์œผ๋กœ ๊ฐ„์ฃผ๋˜๋ฉฐ ์ด์ƒ๊ฐ’์— ์˜ํ•ด ์˜ค์—ผ๋˜์ง€ ์•Š์Œ
      • ์ƒˆ๋กœ์šด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๊ด€์ฐฐ์€ ์ด์ƒ์น˜๋กœ ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ "novelty" ์ด๋ผ ๋ถˆ๋ฆผ

 

 

๐Ÿ˜Ž ์ฝ”๋“œ ๊ตฌํ˜„

1๏ธโƒฃ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ & ํŒจํ‚ค์น˜ ๋กœ๋“œ

  • ํ•„์š”ํ•œ tsod ์„ค์น˜
!pip install tsod # from PyPI
!pip install https://github.com/DHI/tsod/archive/main.zip # dev version
  • ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋กœ๋“œ
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tsod

 

2๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋กœ๋“œ

 
df = pd.read_csv("https://raw.githubusercontent.com/DHI/tsod/main/tests/data/example.csv", parse_dates=True, index_col=0)
df.head()

series = df.value

series

  • Series ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ ๊ฐ–๊ณ ์˜ค๊ธฐ
type(series)

 

 

3๏ธโƒฃ ๊ฐ์ง€๊ธฐ ๋กœ๋“œ

      • ๊ฐ์ง€๊ธฐ๋ฅผ ์„ ํƒ ( RangeDetector or ConstantValueDetector )
        • ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚œ ๊ฐ’์„ ๊ฐ์ง€
        • tsod.RangeDetector(min_value=- inf, max_value=inf, quantiles=None)
          • min_value (float) : ์ตœ์†Œ๊ฐ’ ์ž„๊ณ„๊ฐ’
          • max_value (float) : ์ตœ๋Œ€๊ฐ’ ์ž„๊ณ„๊ฐ’
          • quantiles (list[2]) : ๊ธฐ๋ณธ quantiles [0, 1] (์ตœ์†Œ๊ฐ’ ๋ฐ ์ตœ๋Œ€๊ฐ’๊ณผ ๋™์ผ)
      • ์ด์ƒ ๊ฐ์ง€ detect()
        • ์ด์ƒ ๊ฐ์ง€
        • detect(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame])
          • data (pd.Series) : ๊ฐ€๋Šฅํ•œ ๋ณ€์น™์ด ์žˆ๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ
          • Returns : bool์ด ์žˆ๋Š” ์‹œ๊ณ„์—ด, True == anomaly
          • Return typepd.Series
      •  
rd = tsod.RangeDetector(min_value=0.01, max_value=2.0)

res = rd.detect(series)
series[res]

์ž„๊ณ„๊ฐ’์— ๋ชป๋ฏธ์นœ ์ด์ƒ๊ฐ’๋“ค

plt.plot(series)
plt.plot(series[res], 'ro',label='Anomaly')
plt.legend()

์ด์ƒ๊ฐ’ ๋‹ค์„ฏ๊ฐœ. . 0 ์ด ๋‘๊ฐœ์ž„

 

 

 

 

3๏ธโƒฃ Constant value

    • ConstantValueDetector()
      • ๋” ์˜ค๋žœ ๊ธฐ๊ฐ„ ๋™์•ˆ ์ƒ์ˆ˜ ๊ฐ’์„ ๊ฐ์ง€
      • ์ผ๋ฐ˜์ ์œผ๋กœ ์ผ์ •ํ•œ ์ˆ˜์ค€์—์„œ ๋ฉˆ์ถ”๋Š” ์„ผ์„œ ์˜ค๋ฅ˜๋กœ ์ธํ•ด ๋ฐœ์ƒ
      • tsod.ConstantValueDetector(window_size: int = 3, threshold: float = 1e-07)
        • window_size ( int ) : ์ฐฝ ๋ฒ”์œ„๋Š” [(i - window_size):(i + window_size)]์ด๋ฏ€๋กœ ๋ฐฐ์—ด ์š”์†Œ์˜ ์ˆ˜๋กœ ๊ณ„์‚ฐ๋˜๋Š” ์ฐฝ์˜ ์ ˆ๋ฐ˜ (์ด์ƒ์œผ๋กœ ๊ฐ„์ฃผํ•  ์ตœ์†Œ ์ฐฝ, ๊ธฐ๋ณธ๊ฐ’ 3)
        • ์ž„๊ณ„๊ฐ’ ( float ) : ์ด์ƒ๊ฐ’์„ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•œ ์ž„๊ณ„๊ฐ’ (๋‚ฎ์€ ์ž„๊ณ„๊ฐ’์€ ๊ฐ’์ด ์ด์ƒ๊ฐ’์œผ๋กœ ๊ฐ„์ฃผ๋˜๋Š” ๋ฒ”์œ„๋ฅผ "์ข๊ฒŒ, ๊ธฐ๋ณธ๊ฐ’=3.0)
cd = tsod.ConstantValueDetector()

res = cd.detect(series)
series[res]

์ผ์ •ํ•œ ์ˆ˜์ค€์—์„œ ๋ฉˆ์ถ”๋Š” ์„ผ์„œ ์˜ค๋ฅ˜

plt.plot(series)
plt.plot(series[res], 'ro',label='Anomaly')
plt.legend()

์˜ค๋žœ ๊ธฐ๊ฐ„ ๋™์•ˆ ์ด์ƒ๊ฐ’ ๊ฐ์ž

 

 

4๏ธโƒฃ Combination

  • CombinedDetector()
    • ๊ฐ์ง€๊ธฐ๋ฅผ ๊ฒฐํ•ฉ
    • ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ด์ƒ ํƒ์ง€ ์ „๋žต์„ ๊ฒฐํ•ฉ๋œ ํƒ์ง€๊ธฐ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅ
    • tsod.CombinedDetector(detectors)
combined = tsod.CombinedDetector([tsod.RangeDetector(max_value=2.0),
                                     tsod.ConstantValueDetector()])

res = combined.detect(series)
series[res]

๊ฒฐํ•ฉ ํƒ์ง€๊ธฐ

plt.plot(series)
plt.plot(series[res], 'ro',label='Anomaly')
plt.legend()

min threshold ์—†์œผ๋‹ˆ๊นŒ ์œ„์—์„œ ๋ณธ 0 ์ด์ƒ๊ฐ’ ํƒ์ง€ ์•ˆ๋จ

 

 

5๏ธโƒฃ Constant Gradient

  • ConstantGradientDetector()
    • ์ผ์ •ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐ์ง€
    • ๊ธด ๊ฐ„๊ฒฉ์— ๋Œ€ํ•œ ์„ ํ˜• ๋ณด๊ฐ„์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒ
    • tsod.ConstantGradientDetector(window_size: int = 3)
      • window_size ( int ) : ์ด์ƒ์œผ๋กœ ๊ฐ„์ฃผํ•  ์ตœ์†Œ ์ฐฝ, ๊ธฐ๋ณธ๊ฐ’ 3
cgd = tsod.ConstantGradientDetector()

res = cgd.detect(series)

plt.figure(figsize=(16,4))
plt.plot(series)
plt.plot(series[res], 'ro',label='Anomaly')
plt.legend()

์ผ์ •ํ•˜๋‹ˆ๊นŒ ์ง์„ ์ธ ๊ธฐ์šธ๊ธฐ๋ฅผ ํƒ์ง€ํ•œ๋‹ค

 

 

 

6๏ธโƒฃ Gradient

  • GradientDetector()
    • ๊ธ‰๊ฒฉํ•œ ๋ณ€ํ™” ๊ฐ์ง€
    • tsod.GradientDetector(max_gradient=inf, direction='both')
      • max_gradient ( float ) : ์ดˆ๋‹น ์ตœ๋Œ€ ๋ณ€ํ™”์œจ, ๊ธฐ๋ณธ np.inf
      • direction ( str ) : ์–‘์ˆ˜, ์Œ์ˆ˜ ๋˜๋Š” ๋‘˜ ๋‹ค, ๊ธฐ๋ณธ๊ฐ’='both'
magd = tsod.GradientDetector()
magd.fit(series[0:10])

res = magd.detect(series)
series[res]

plt.figure(figsize=(16,4))
plt.plot(series)
plt.plot(series[res], 'ro',label='Anomaly')
plt.legend()
plt.title(magd)

๊ธฐ์šธ๊ธฐ ๊ธ‰๊ฒฉํ•œ ์นœ๊ตฌ๋“ค๋งŒ ๊ฐ์ง€

 

 

7๏ธโƒฃ Rolling standard deviation

  • ๊ฐ‘์ž‘์Šค๋Ÿฌ์šด ํฐ ๋ณ€ํ™”๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ
normal_data = pd.Series(np.random.normal(size=100,scale=0.3) + 10.0*np.sin(np.linspace(0,2*np.pi,num=100)))
abnormal_data = pd.Series(np.random.normal(size=20,scale=5.0) + normal_data.iloc[-1])

all_data = pd.concat([normal_data,abnormal_data,normal_data[21:]],ignore_index=True)

all_data[150]= 5.0

all_data.plot()

 

  • ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฒ”์œ„ ๋‚ด์— ์žˆ์ง€๋งŒ ๋ณ€๋™์ด ์˜ˆ์ƒ๋ณด๋‹ค ํฌ๋ฏ€๋กœ ๋น„์ •์ƒ
rsd = tsod.RollingStandardDeviationDetector(window_size=10, center=True)
rsd.fit(normal_data)

res = rsd.detect(all_data)
all_data[res]

 

plt.figure(figsize=(16,4))
plt.plot(all_data)
plt.plot(all_data[res], 'ro',label='Anomaly')
plt.legend()
plt.title(rsd)

 

 

 

 

8๏ธโƒฃ Diff

  • ๊ฒฝ๊ณผ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ๊ธ‰๊ฒฉํ•œ ๋ณ€ํ™”๋ฅผ ๊ฐ์ง€
drd = tsod.DiffDetector()
drd.fit(normal_data)

res = drd.detect(all_data)
all_data[res]

plt.figure(figsize=(16,4))
plt.plot(all_data)
plt.plot(all_data[res], 'ro',label='Anomaly')
plt.legend()
plt.title(drd)

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments