๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (14) ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ๋ฐฑ์—”๋“œ(Back-End)/Node js

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (14)

์ง•์ง•์•ŒํŒŒ์นด 2022. 11. 28. 13:38
728x90
๋ฐ˜์‘ํ˜•

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” itadventrue ๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-)>

https://itadventure.tistory.com/557

 

ํŒŒ๋„!(15) - ๋ผ์˜ํšŒ๊ท€์™€ 4์ฐจ๋ฐฉ์ •์‹๊นŒ์ง€

๐Ÿฟ 'ํŒŒ๋„'๋Š” ํŒŒ์ด์Šคํฌ๋ฆฝํŠธ ๋„์ „๊ธฐ์˜ ์ค„์ž„๋ง์ž…๋‹ˆ๋‹ค. ์ง€๋‚œ ๊ฒŒ์‹œ๊ธ€์—์„œ ์—ฐ์žฌ๋˜๋Š” ๊ธ€์ž…๋‹ˆ๋‹ค. : https://itadventure.tistory.com/555 ํŒŒ๋„!(14) - ๋ฆฟ์ง€ ๋ฆฌ๊ทธ๋ ˆ์…˜์œผ๋กœ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง„๋‹ค๊ตฌ? โ€ป 'ํŒŒ๋„'๋Š” ํŒŒ์ด์Šคํฌ

itadventure.tistory.com

 

 

 

๐Ÿฅ ๋ผ์˜ ๋ฆฌ๊ทธ๋ ˆ์…˜(Lasso Regression)

PolynomialFeatures ( ํด๋ฆฌ๋…ธ๋ฏธ์–ผ ํ”ผ์ณ ), ๋‹คํ•ญํŠน์„ฑ ๋ชจ๋“ˆ
์ œ๊ณฑ์ด๋‚˜ ๊ณฑํ•˜๊ธฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ ๋ชจ๋‘ ๋‚˜์—ด

 

1) ํด๋ฆฌ ๋ชจ๋“ˆ์„ ๋ถˆ๋Ÿฌ์™€ ๋‹คํ•ญ์‹ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ

from sklearn.preprocessing import PolynomialFeatures

ํด๋ฆฌ = PolynomialFeatures(degree=4, include_bias=False)

 

2) ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด ํ‹€์— ๋งž๊ฒŒ ํ›ˆ๋ จ ์‹œํ‚ด

ํด๋ฆฌ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ)

 

3) ํ›ˆ๋ จ๋œ ๋‹คํ•ญ์‹ ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์œผ๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚˜์˜ด

ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต = ํด๋ฆฌ.transform(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ)

 

4) ํŠน์„ฑ ๋ชฉ๋ก ์ถœ๋ ฅ

print(ํด๋ฆฌ.get_feature_names_out())

 

 

๐Ÿฅ ๋ผ์˜ ๋ฆฌ๊ทธ๋ ˆ์…˜(Lasso Regression) vs ๋ฆฟ์ง€ ๋ฆฌ๊ทธ๋ ˆ์…˜(Ridge Regression)

ํ•ญ์ด ๋งŽ์„์ˆ˜๋ก ๋”์šฑ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ์ด ๋  ์ˆ˜ ์žˆ์›€
ํ•ญ์ด ๋งŽ์„์ˆ˜๋ก ์•ŒํŒŒ๊ฐ’์„ ๋” ๋†’๊ฒŒ ์ค˜์•ผ ํ•จ

 

#=====================================
# ๋ฆฟ์ง€๋ชจ๋ธ
from sklearn.linear_model import Ridge
๋ฆฟ์ง€๋ชจ๋ธ = Ridge(alpha=10)
๋ฆฟ์ง€๋ชจ๋ธ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ)

# ์ข…๋ฅ˜๊ฐ€ ๋ชฉํ‘œ๊ฐ€ ์•„๋‹Œ ์ด์ƒ ์ •ํ™•๋„๋Š” ์ธก์ • ๋ถˆ๊ฐ€
print("๋ฆฟ์ง€ ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
print(๋ฆฟ์ง€๋ชจ๋ธ.score(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ))
print("๋ฆฟ์ง€ ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
print(๋ฆฟ์ง€๋ชจ๋ธ.score(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ…Œ์ŠคํŠธ๋ชฉํ‘œ))

ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ์˜ˆ์ธก_๋ฆฟ์ง€ = ๋ฆฟ์ง€๋ชจ๋ธ.predict(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต)
ํ…Œ์ŠคํŠธ๋ชฉํ‘œ์˜ˆ์ธก_๋ฆฟ์ง€ = ๋ฆฟ์ง€๋ชจ๋ธ.predict(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต)

#=====================================
# ๋ผ์˜๋ชจ๋ธ
from sklearn.linear_model import Lasso
๋ผ์˜๋ชจ๋ธ = Lasso(alpha=10)
๋ผ์˜๋ชจ๋ธ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ)

# ์ข…๋ฅ˜๊ฐ€ ๋ชฉํ‘œ๊ฐ€ ์•„๋‹Œ ์ด์ƒ ์ •ํ™•๋„๋Š” ์ธก์ • ๋ถˆ๊ฐ€
print("๋ผ์˜ ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
print(๋ผ์˜๋ชจ๋ธ.score(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ))
print("๋ผ์˜ ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
print(๋ผ์˜๋ชจ๋ธ.score(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต, ํ…Œ์ŠคํŠธ๋ชฉํ‘œ))

ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ์˜ˆ์ธก_๋ผ์˜ = ๋ผ์˜๋ชจ๋ธ.predict(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต)
ํ…Œ์ŠคํŠธ๋ชฉํ‘œ์˜ˆ์ธก_๋ผ์˜ = ๋ผ์˜๋ชจ๋ธ.predict(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_๊ฐ€๊ณต)

 

 

๐Ÿฅ ์ฝ”๋“œ ๊ตฌํ˜„

์›๋ณธ, ๋ฆฟ์ง€, ๋ผ์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ํ•จ๊ป˜ ๊ทธ๋ฆฌ๊ธฐ

 

  • index.html
<html> 
    <head> 
      <title>๋‹คํ•ญํšŒ๊ท€ + ๋ผ์˜ ๋ฆฌ๊ทธ๋ ˆ์…˜</title>
      <link rel="stylesheet" 
        href="https://pyscript.net/alpha/pyscript.css" /> 
      <script defer 
        src="https://pyscript.net/alpha/pyscript.js"></script> 

<py-env>
  - pandas
  - matplotlib
  - seaborn
  - scikit-learn
  - paths :
    - ./common.py
</py-env>
    </head>
  <body> 
    <link rel="stylesheet" href="pytable.css"/>
    <py-script>
    import pandas as pd
    from pyodide.http import open_url
    from common import *
    import numpy as np

    from datetime import datetime

    <!-- ๋„˜ํŒŒ์ด ๋ฐฐ์—ด ์ถœ๋ ฅ์‹œ ์†Œ์ˆซ์  ์ž๋ฆด์ˆ˜ ์ง€์ • -->
    np.set_printoptions(formatter={'float_kind': lambda x: "{0:0.2f}".format(x)})

    <!-- ๊ฒฝ๊ณ  ๋ฌธ๊ตฌ ์ œ๊ฑฐ -->
    import warnings
    warnings.filterwarnings( 'ignore' )

    <!-- ํŒ๋‹ค์Šค์—์„œ csv ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ์–ด์˜ด -->
    SalesData = pd.read_csv(open_url(
      "http://dreamplan7.cafe24.com/pyscript/csv/avocado.csv"
    ))      

    <!-- # 3๊ฐœ ํ•„๋“œ๋งŒ ์ถ”๋ ค์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋‹ค์‹œ ๋งŒ๋“ฌ -->
    SalesData = SalesData[[
      'Date', 
      'Total Volume',
      'AveragePrice'
    ]]   

    SalesData.columns = [
      'Day', 
      'Amount',
      'AveragePrice'
    ]

    <!-- ๋‚ ์งœ๋ณ„๋กœ ( ์ฃผ ๋‹จ์œ„๋กœ ) ๊ทธ๋ฃน์„ ์ง€์„ ๋•Œ๋„ ๋งค์ถœ๋Ÿ‰์€ ๊ทธ๋ฃน๋‹จ์œ„๋กœ ํ•ฉ์‚ฐํ•˜์—ฌ ํ•ฉ๊ณ„ -->
    WeekdaysSales_sum = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['Amount']].sum() \
    .sort_values(by='Day', ascending=True)
    
    WeekdaysSales_mean = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['AveragePrice']].mean() \
    .sort_values(by='Day', ascending=True)

    <!-- 2๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•˜๋‚˜๋กœ merge  (on์— ๊ธฐ์žฌ๋œ '๋‚ ์งœ'๋ฅผ ๊ธฐ์ค€) -->
    WeekdaysSalesData = pd.merge(WeekdaysSales_sum, WeekdaysSales_mean, on = 'Day')


    <!-- ๋‚ ์งœ(์‹œ๊ฐ„๊ฐ’) ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(1, 'Day(timeValue)',
        '',   True)
  
    for i in WeekdaysSalesData['Day'].index:
      WeekdaysSalesData['Day(timeValue)'].loc[i]=time.mktime(
      datetime.strptime(
        WeekdaysSalesData['Day'].loc[i], 
        '%Y-%m-%d'
        ).timetuple()
      )

    <!-- 10000์œผ๋กœ ๋‚˜๋ˆˆ ๋งค์ถœ๋Ÿ‰ ํ•„๋“œ ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(3, 'Amount(10000)', 
    WeekdaysSalesData['Amount']/10000, 
      True)

    <!-- ํ›ˆ๋ จํ•™์Šต์šฉ์œผ๋กœ ๋‚ ์งœ๋ฅผ ์—ฐ๋„, ์›”, ์ผ๋กœ ๋‚˜๋ˆˆ๋‹ค -->
    WeekdaysSalesData.insert(4, 'year', '', True)
    WeekdaysSalesData.insert(5, 'month', '', True)
    WeekdaysSalesData.insert(6, 'day', '', True)
    WeekdaysSalesData.insert(7, 'week', '', True)

    for i in WeekdaysSalesData['Day'].index:
      temp = str(WeekdaysSalesData['Day'].loc[i]).split('-')
      year = int(temp[0])
      month = int(temp[1])
      day = int(temp[2])
      WeekdaysSalesData['year'].loc[i] = year
      WeekdaysSalesData['month'].loc[i] = month
      WeekdaysSalesData['day'].loc[i] = day
      WeekdaysSalesData['week'].loc[i] = str(
        datetime(year, month, day).isocalendar()[1]
      )

    createElementDiv(
      document, 
      Element, 
      'output2'
    ).write(WeekdaysSalesData)

    WeekdaysSalesDataTrain_numpy = WeekdaysSalesData[['Day(timeValue)', 'year', 'month', 'day', 'week', 'AveragePrice']].to_numpy()
    WeekdaysSalesDataTest_numpy = WeekdaysSalesData['Amount(10000)'].to_numpy()
    WeekdaysSalesDataDay_numpy = WeekdaysSalesData['Day'].to_numpy()

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = \
      train_test_split(
        WeekdaysSalesDataTrain_numpy, 
        WeekdaysSalesDataTest_numpy,
        random_state=100,
        shuffle=False)

    <!-- PolynomialFeatures ( ํด๋ฆฌ๋…ธ๋ฏธ์–ผ ํ”ผ์ณ ), ๋‹คํ•ญํŠน์„ฑ ๋ชจ๋“ˆ -->
    <!-- ์ œ๊ณฑ์ด๋‚˜ ๊ณฑํ•˜๊ธฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ ๋ชจ๋‘ ๋‚˜์—ด -->
    from sklearn.preprocessing import PolynomialFeatures
    polynomial = PolynomialFeatures(degree=4, include_bias=False) # ์ ˆํŽธ ์†์„ฑ์€ ์ œ๊ฑฐ
    polynomial.fit(X_train) # ํŠน์„ฑ์„ ๋‹คํ•ญ์œผ๋กœ ์ž๋™์œผ๋กœ ๋ถˆ๋ฆผ
    
    train_polynomial_added = polynomial.transform(X_train) # ํ•™์Šต์— ์ถ”๊ฐ€๋œ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋งž๊ฒŒ ๋‹คํ•ญ ๋ณ€ํ™˜
    test_polynomial_added = polynomial.transform(X_test) # ํ…Œ์ŠคํŠธ ์„ธํŠธ๋„ ๋‹คํ•ญ ๋ณ€ํ™˜, fitํ–ˆ๋˜ ํ›ˆ๋ จ poly ๋ฅผ ์‚ฌ์šฉ.
    
    <!-- ์Šค์ผ€์ผํ™”๋Š” '๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •ํ™”' -->
    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaler.fit(train_polynomial_added)
    train_polynomial_added = scaler.transform(train_polynomial_added)
    test_polynomial_added = scaler.transform(test_polynomial_added)

    <!-- ===================================== -->
    <!-- ๋ฆฟ์ง€๋ชจ๋ธ -->
    from sklearn.linear_model import Ridge
    ridge_model = Ridge(alpha=0.1)
    ridge_model.fit(train_polynomial_added, y_train)

    <!-- ํ›ˆ๋ จ๊ณผ์ •์— ๋Œ€ํ•œ ์ฒ™๋„๋ฅผ ํ‰๊ฐ€ -> score()  -->
    print("๋ฆฟ์ง€ ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
    print(ridge_model.score(train_polynomial_added, y_train))
    print("๋ฆฟ์ง€ ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
    print(ridge_model.score(test_polynomial_added, y_test))

    <!-- ์Šค์ผ€์ผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก๊ฒฐ๊ณผ -->
    y_train_ridge_predict = ridge_model.predict(train_polynomial_added)
    y_test_ridge_predict = ridge_model.predict(test_polynomial_added)

    <!-- ===================================== -->
    <!-- ๋ผ์˜๋ชจ๋ธ -->
    from sklearn.linear_model import Lasso
    lasso_model = Lasso(alpha=0.1)
    lasso_model.fit(train_polynomial_added, y_train)

    <!-- ํ›ˆ๋ จ๊ณผ์ •์— ๋Œ€ํ•œ ์ฒ™๋„๋ฅผ ํ‰๊ฐ€ -> score()  -->
    print("๋ผ์˜ ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
    print(lasso_model.score(train_polynomial_added, y_train))
    print("๋ผ์˜ ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
    print(lasso_model.score(test_polynomial_added, y_test))

    <!-- ์Šค์ผ€์ผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก๊ฒฐ๊ณผ -->
    y_train_lasso_predict = lasso_model.predict(train_polynomial_added)
    y_test_lasso_predict = lasso_model.predict(test_polynomial_added)

    import matplotlib.pyplot as plt
    import matplotlib as mat

    <!-- ๊ทธ๋ž˜ํ”„ -->
    fig = plt.figure(
      figsize=(15, 7)
    )

    plt.xticks(
      WeekdaysSalesDataTrain_numpy[:, 0],
      WeekdaysSalesDataDay_numpy, 
      rotation=90)

    plt.title('Weekdays Avocado SalesAmount (Ridge, Lasso)')

    line_alpha=0.5

    <!-- ์›๋ณธ -->
    plt.plot(        
        X_train[:,0],
        y_train,
        marker='o',
        color='gray',
        label='Original', 
        alpha = line_alpha
    )
    plt.plot(        
        X_test[:,0],
        y_test,
        marker='o',
        color='gray',
        alpha = line_alpha
    )
    
    <!-- ๋ฆฟ์ง€ -->
    plt.plot(        
        X_train[:,0],
        y_train_ridge_predict,
        marker='d',
        color='green',
        label='Train pattern (Ridge)', 
        alpha = line_alpha
    )

    <!-- ๋ผ์˜ -->
    plt.plot(        
        X_train[:,0],
        y_train_lasso_predict,
        marker='d',
        color='red',
        label='Train pattern (Lasso)', 
        alpha = line_alpha
    )

    <!-- ๋ฆฟ์ง€ ์˜ˆ์ธก -->
    plt.plot(        
        X_test[:,0],
        y_test_ridge_predict,
        marker='*',
        color='blue',
        label='Predict pattern (Ridge)', 
        alpha = line_alpha
    )

    <!-- ๋ผ์˜ ์˜ˆ์ธก-->
    plt.plot(        
        X_test[:,0],
        y_test_lasso_predict,
        marker='*',
        color='red',
        label='Predict pattern (Lasso)', 
        alpha = line_alpha
    )

    plt.xlabel('Day')
    plt.ylabel('Day(timeValue)')

    plt.legend(
      shadow=True
    )

    ax = plt.gca()
    <!-- ์ถ•๋งŒ ๊ทธ๋ฆฌ๋“œ -->
    ax.xaxis.grid(True)

    <!-- ๋ฐฐ๊ฒฝ์ƒ‰, ๋งˆ์ง„ ์กฐ์ • -->
    ax.set_facecolor('#e8e7d2')
    ax.margins(x=0.01, y=0.02)

    <!-- ์ฃผ์œ„ ์ด์ƒํ•œ ์—ฌ๋ฐฑ ์—†์• ๊ธฐ -->
    fig.tight_layout() 
    fig
</py-script> 
  </body> 
</html>

 

  • common.py
def createElementDiv(document, Element, name):
    element = document.createElement('div')
    element.id = name
    document.body.append(element)
    return Element(name)

728x90
๋ฐ˜์‘ํ˜•
Comments