๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (13) ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ๋ฐฑ์—”๋“œ(Back-End)/Node js

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (13)

์ง•์ง•์•ŒํŒŒ์นด 2022. 11. 28. 10:54
728x90
๋ฐ˜์‘ํ˜•

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” itadventrue ๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-)>

https://itadventure.tistory.com/555

 

ํŒŒ๋„!(14) - ๋ฆฟ์ง€ ๋ฆฌ๊ทธ๋ ˆ์…˜์œผ๋กœ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง„๋‹ค๊ตฌ?

โ€ป 'ํŒŒ๋„'๋Š” ํŒŒ์ด์Šคํฌ๋ฆฝํŠธ ๋„์ „๊ธฐ์˜ ์ค€๋ง์ž…๋‹ˆ๋‹ค. ์ง€๋‚œ ๊ฒŒ์‹œ๊ธ€์— ์—ฐ์žฌ๋˜๋Š” ๊ธ€์ž…๋‹ˆ๋‹ค : https://itadventure.tistory.com/554 ํŒŒ๋„!(13) - ์Œ? ์ธ๊ณต์ง€๋Šฅ ์ ์ค‘์œจ์ด?! - ํ‰๊ท ๊ฐ€๊ฒฉ ์ถ”๊ฐ€ 'ํŒŒ๋„'๋Š” ํŒŒ์ด์Šคํฌ๋ฆฝํŠธ ๋„์ „

itadventure.tistory.com

 

 

 

๐ŸŒต ๊ณผ์†Œ์ ํ•ฉ

ํ›ˆ๋ จ๋ฐ์ดํ„ฐ ์ ์ค‘์œจ์ด ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ์ ์ค‘์œจ๋ณด๋‹ค ๋‚ฎ์€ ๊ฒฝ์šฐ๋ฅผ '๊ณผ์†Œ์ ํ•ฉ'

ํ›ˆ๋ จ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ์ ์ค‘์œจ์ด ๋น„์Šทํ•ด์•ผ ์ข‹์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

 

๐ŸŒต ์ฐจ์ˆ˜

  • ์„ ํ˜•ํšŒ๊ท€
    • 1์ฐจ ๋ฐฉ์ •์‹
    • x๊ฐ’์ด ํ•œ๊ฐœ๊ฐ€ ์•„๋‹ˆ๋ผ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ x๊ฐ’์ด ์กด์žฌ
    • ๋จธ์‹ ๋Ÿฌ๋‹์ด ํ›ˆ๋ จ๊ณผ์ •์— ๊ทธ ๊ฐ’๋“ค์„ ์ ˆ๋ฌ˜ํ•˜๊ฒŒ ์กฐ์ •ํ•ด ๋งค์ถœ๋Ÿ‰(y)๊ฐ’์„ ์‚ฐ์ถœ

 

์ œ๊ณฑ ์ˆ˜์น˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•˜๊ฒŒ ๋˜๋ฉด ๊ทธ๋ž˜ํ”„๊ฐ€ ์ข€ ๋” ์ •๋ฐ€ํ•œ ํ‘œํ˜„

์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด = ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด.astype(np.float)
์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด = np.column_stack(( 
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด ,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,0] ** 2,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,1] ** 2,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,2] ** 2,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,3] ** 2,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,4] ** 2,
  ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด[:,5] ** 2
))

 

astype(np.float) ๋Š” ๋„˜ํŒŒ์ด ๋ฐ์ดํ„ฐ๋“ค์„ ๋ฌธ์ž์—์„œ ์ˆซ์ž๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ๊ธฐ๋Šฅ

np.column_stack ์€ ๋„˜ํŒŒ์ด ์›๋ณธ ๋ฐ์ดํ„ฐ์— ํ•˜๋‚˜์”ฉ ์—ด์„ ์ถ”๊ฐ€

 

 

 

๐ŸŒต ๊ณผ๋Œ€์ ํ•ฉ

ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ์— ๋„ˆ๋ฌด ์ถฉ์‹ค

ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ์ ์ค‘์œจ์ด ์•„์ฃผ ์ข‹์ง€ ์•Š์Œ

 

๐Ÿฅ‘ ๋ฆฟ์ง€ ๋ฆฌ๊ทธ๋ ˆ์…˜(Rigde Regression) -> ๊ณผ์ ํ•ฉ์„ ํ•ด๊ฒฐ

from sklearn.linear_model import Ridge

๋ฆฟ์ง€๋ชจ๋ธ = Ridge(alpha=์•ŒํŒŒ๊ฐ’)
๋ฆฟ์ง€๋ชจ๋ธ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ)

print("ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ ์ •ํ™•๋„")
print(๋ฆฟ์ง€๋ชจ๋ธ.score(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ))
print("ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ์ •ํ™•๋„")
print(๋ฆฟ์ง€๋ชจ๋ธ.score(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ, ํ…Œ์ŠคํŠธ๋ชฉํ‘œ))

ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ์˜ˆ์ธก = ๋ฆฟ์ง€๋ชจ๋ธ.predict(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ)
ํ…Œ์ŠคํŠธ๋ชฉํ‘œ์˜ˆ์ธก = ๋ฆฟ์ง€๋ชจ๋ธ.predict(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ)

 

์•ŒํŒŒ๊ฐ’์ด๋ž€ ๋ณ€์ˆ˜๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ฃผ๋Š” ์ 
์ด ์•ŒํŒŒ๊ฐ’์ด ๊ณผ์ ํ•ฉ๋˜์ง€ ์•Š๋„๋ก ๋ฐฉ์ง€ํ•ด์ฃผ๋Š” ์˜ต์…˜

0.01, 0.1, 1, 10 ๋“ฑ์œผ๋กœ 1์˜ ๋ฐฐ์ˆ˜ ๋‹จ์œ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ 
์‚ฌ๋žŒ์ด ์ด ๊ฐ’์œผ๋กœ ์ •๋„๋ฅผ ์กฐ์ ˆํ•ด์•ผ ํ•œ๋‹ค๊ณ  ํ•ด์„œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ(Hyper parameter)

๋ฆฟ์ง€๋ชจ๋ธ = Ridge(alpha=0.1)

 

๐Ÿฅ‘ ๋ผ์˜ ๋ฆฌ๊ทธ๋ ˆ์…˜(Lasso Regression) -> ๊ณผ์ ํ•ฉ์„ ํ•ด๊ฒฐ

๋ผ์˜๋Š” ๊ทธ๋‹ค์Œ ํŽ˜์ด์ง€์—์„œ~

 

๐ŸŒต ์ฝ”๋“œ ๊ตฌํ˜„

  • index.html
<html> 
    <head> 
      <link rel="stylesheet" 
        href="https://pyscript.net/alpha/pyscript.css" /> 
      <script defer 
        src="https://pyscript.net/alpha/pyscript.js"></script> 

<py-env>
  - pandas
  - matplotlib
  - seaborn
  - scikit-learn
  - paths :
    - ./common.py
</py-env>
    </head>
  <body> 
    <link rel="stylesheet" href="pytable.css"/>
    <py-script>
    import pandas as pd
    from pyodide.http import open_url
    from common import *
    import numpy as np

    from datetime import datetime

    <!-- ๋„˜ํŒŒ์ด ๋ฐฐ์—ด ์ถœ๋ ฅ์‹œ ์†Œ์ˆซ์  ์ž๋ฆด์ˆ˜ ์ง€์ • -->
    np.set_printoptions(formatter={'float_kind': lambda x: "{0:0.2f}".format(x)})

    <!-- ๊ฒฝ๊ณ  ๋ฌธ๊ตฌ ์ œ๊ฑฐ -->
    import warnings
    warnings.filterwarnings( 'ignore' )

    <!-- ํŒ๋‹ค์Šค์—์„œ csv ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ์–ด์˜ด -->
    SalesData = pd.read_csv(open_url(
      "http://dreamplan7.cafe24.com/pyscript/csv/avocado.csv"
    ))      

    <!-- # 3๊ฐœ ํ•„๋“œ๋งŒ ์ถ”๋ ค์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋‹ค์‹œ ๋งŒ๋“ฌ -->
    SalesData = SalesData[[
      'Date', 
      'Total Volume',
      'AveragePrice'
    ]]   

    SalesData.columns = [
      'Day', 
      'Amount',
      'AveragePrice'
    ]

    <!-- ๋‚ ์งœ๋ณ„๋กœ ( ์ฃผ ๋‹จ์œ„๋กœ ) ๊ทธ๋ฃน์„ ์ง€์„ ๋•Œ๋„ ๋งค์ถœ๋Ÿ‰์€ ๊ทธ๋ฃน๋‹จ์œ„๋กœ ํ•ฉ์‚ฐํ•˜์—ฌ ํ•ฉ๊ณ„ -->
    WeekdaysSales_sum = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['Amount']].sum() \
    .sort_values(by='Day', ascending=True)
    
    WeekdaysSales_mean = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['AveragePrice']].mean() \
    .sort_values(by='Day', ascending=True)

    <!-- 2๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•˜๋‚˜๋กœ merge  (on์— ๊ธฐ์žฌ๋œ '๋‚ ์งœ'๋ฅผ ๊ธฐ์ค€) -->
    WeekdaysSalesData = pd.merge(WeekdaysSales_sum, WeekdaysSales_mean, on = 'Day')


    <!-- ๋‚ ์งœ(์‹œ๊ฐ„๊ฐ’) ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(1, 'Day(timeValue)',
        '',   True)
  
    for i in WeekdaysSalesData['Day'].index:
      WeekdaysSalesData['Day(timeValue)'].loc[i]=time.mktime(
      datetime.strptime(
        WeekdaysSalesData['Day'].loc[i], 
        '%Y-%m-%d'
        ).timetuple()
      )

    <!-- 10000์œผ๋กœ ๋‚˜๋ˆˆ ๋งค์ถœ๋Ÿ‰ ํ•„๋“œ ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(3, 'Amount(10000)', 
    WeekdaysSalesData['Amount']/10000, 
      True)

    <!-- ํ›ˆ๋ จํ•™์Šต์šฉ์œผ๋กœ ๋‚ ์งœ๋ฅผ ์—ฐ๋„, ์›”, ์ผ๋กœ ๋‚˜๋ˆˆ๋‹ค -->
    WeekdaysSalesData.insert(4, 'year', '', True)
    WeekdaysSalesData.insert(5, 'month', '', True)
    WeekdaysSalesData.insert(6, 'day', '', True)
    WeekdaysSalesData.insert(7, 'week', '', True)

    for i in WeekdaysSalesData['Day'].index:
      temp = str(WeekdaysSalesData['Day'].loc[i]).split('-')
      year = int(temp[0])
      month = int(temp[1])
      day = int(temp[2])
      WeekdaysSalesData['year'].loc[i] = year
      WeekdaysSalesData['month'].loc[i] = month
      WeekdaysSalesData['day'].loc[i] = day
      WeekdaysSalesData['week'].loc[i] = str(
        datetime(year, month, day).isocalendar()[1]
      )

    createElementDiv(
      document, 
      Element, 
      'output2'
    ).write(WeekdaysSalesData)

    WeekdaysSalesDataTrain_numpy = WeekdaysSalesData[['Day(timeValue)', 'year', 'month', 'day', 'week', 'AveragePrice']].to_numpy()
    WeekdaysSalesDataTest_numpy = WeekdaysSalesData['Amount(10000)'].to_numpy()
    WeekdaysSalesDataTrain_numpy = WeekdaysSalesDataTrain_numpy.astype(np.float)

    WeekdaysSalesDataTrain_numpy = np.column_stack(( 
      WeekdaysSalesDataTrain_numpy ,
      WeekdaysSalesDataTrain_numpy[:,0] ** 2,
      WeekdaysSalesDataTrain_numpy[:,1] ** 2,
      WeekdaysSalesDataTrain_numpy[:,2] ** 2,
      WeekdaysSalesDataTrain_numpy[:,3] ** 2,
      WeekdaysSalesDataTrain_numpy[:,4] ** 2,
      WeekdaysSalesDataTrain_numpy[:,5] ** 2
    ))


    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = \
      train_test_split(
        WeekdaysSalesDataTrain_numpy, 
        WeekdaysSalesDataTest_numpy,
        random_state=100,
        shuffle=False)

    <!-- ์Šค์ผ€์ผํ™”๋Š” '๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •ํ™”' -->
    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train_scaler = scaler.transform(X_train)
    X_test_scaler = scaler.transform(X_test)

    from sklearn.linear_model import Ridge
    ridge_model = Ridge(alpha=0.1)
    ridge_model.fit(X_train_scaler, y_train)

    <!-- ํ›ˆ๋ จ๊ณผ์ •์— ๋Œ€ํ•œ ์ฒ™๋„๋ฅผ ํ‰๊ฐ€ -> score()  -->
    print("ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
    print(ridge_model.score(X_train_scaler, y_train))
    print("ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
    print(ridge_model.score(X_test_scaler, y_test))

    <!-- ์Šค์ผ€์ผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก๊ฒฐ๊ณผ -->
    y_train_predict = ridge_model.predict(X_train_scaler)
    y_test_predict = ridge_model.predict(X_test_scaler)

    import matplotlib.pyplot as plt
    import matplotlib as mat

    <!-- ๊ทธ๋ž˜ํ”„ -->
    fig = plt.figure(
      figsize=(15, 7)
    )

    plt.xticks(WeekdaysSalesData['Day(timeValue)'].to_numpy(), WeekdaysSalesData[['Day']].to_numpy()[:,0], rotation=90)

    plt.title('Weekdays Avocado SalesAmount')

    plt.plot(        
        X_train[:,0],
        y_train,
        marker='o',
        color='#c14549',
        label='Original'
    )
    plt.plot(        
        X_train[:,0],
        y_train_predict,
        marker='d',
        color='blue',
        label='Train pattern'
    )

    plt.plot(        
        X_test[:, 0],
        y_test,
        marker='o',
        color='#c14549'
    )

    plt.plot(        
        X_test[:, 0],
        y_test_predict,
        marker='d',
        color='green',
        label='Predict pattern'
    )

    plt.xlabel('Day')
    plt.ylabel('Day(timeValue)')

    plt.legend(
      shadow=True
    )

    ax = plt.gca()
    <!-- ์ถ•๋งŒ ๊ทธ๋ฆฌ๋“œ -->
    ax.xaxis.grid(True)

    <!-- ๋ฐฐ๊ฒฝ์ƒ‰, ๋งˆ์ง„ ์กฐ์ • -->
    ax.set_facecolor('#e8e7d2')
    ax.margins(x=0.01, y=0.02)

    <!-- ์ฃผ์œ„ ์ด์ƒํ•œ ์—ฌ๋ฐฑ ์—†์• ๊ธฐ -->
    fig.tight_layout() 
    fig
</py-script> 
  </body> 
</html>

 

  • common.py
def createElementDiv(document, Element, name):
    element = document.createElement('div')
    element.id = name
    document.body.append(element)
    return Element(name)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments