๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (12) ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ๋ฐฑ์—”๋“œ(Back-End)/Node js

HTML์—์„œ Python์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” PyScript (12)

์ง•์ง•์•ŒํŒŒ์นด 2022. 11. 28. 09:37
728x90
๋ฐ˜์‘ํ˜•

<๋ณธ ๋ธ”๋กœ๊ทธ๋Š” itadventrue ๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-)>

https://itadventure.tistory.com/554

 

ํŒŒ๋„!(13) - ์Œ? ์ธ๊ณต์ง€๋Šฅ ์ ์ค‘์œจ์ด?! - ํ‰๊ท ๊ฐ€๊ฒฉ ์ถ”๊ฐ€

'ํŒŒ๋„'๋Š” ํŒŒ์ด์Šคํฌ๋ฆฝํŠธ ๋„์ „๊ธฐ์˜ ์ค„์ž„๋ง์ž…๋‹ˆ๋‹ค. ์ง€๋‚œ ๊ฒŒ์‹œ๊ธ€์—์„œ ์ด์–ด์ง€๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค : https://itadventure.tistory.com/553 ํŒŒ๋„!(12) - ๋ฌด์‹  ๋Ÿฌ๋‹? ๋จธ์‹ ๋Ÿฌ๋‹! - ๋ฆฌ๋‹ˆ์–ด ๋ฆฌ๊ทธ๋ ˆ์…˜ ( LinearRegression ) 'ํŒŒ๋„'๋Š”

itadventure.tistory.com

 

 

 

 

๐Ÿž ํ‰๊ท ๊ฐ€๊ฒฉ ํฌํ•จ

ํ‰๊ท ๊ฐ€๊ฒฉ์„ ํฌํ•จํ•˜๊ธฐ ์œ„ํ•ด CSV ํŒŒ์ผ์„ ์ฝ๊ณ  3๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๋ฐ›์•„์˜ค๋„๋ก ๋ณ€๊ฒฝ

AveragePrice ๊ฐ€ ์ œ๊ณต๋œ ๋ฐ์ดํ„ฐ์ค‘ ํ‰๊ท ๊ฐ€๊ฒฉ

 

  • 3๊ฐœ์˜ ์ปฌ๋Ÿผ
# ํŒ๋‹ค์Šค์—์„œ csv ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ์–ด์˜ด
๋งค์ถœ๋ฐ์ดํ„ฐ = pd.read_csv(open_url(
  "http://dreamplan7.cafe24.com/pyscript/csv/avocado.csv"
))      

# 3๊ฐœ ํ•„๋“œ๋งŒ ์ถ”๋ ค์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋‹ค์‹œ ๋งŒ๋“ฌ
๋งค์ถœ๋ฐ์ดํ„ฐ = ๋งค์ถœ๋ฐ์ดํ„ฐ[[
  'Date', 
  'Total Volume',
  'AveragePrice'
]]

 

  • ๋‚ ์งœ๋ณ„๋กœ ( ์ฃผ ๋‹จ์œ„๋กœ ) ๊ทธ๋ฃน์„ ์ง€์„ ๋•Œ๋„ ๋งค์ถœ๋Ÿ‰์€ ๊ทธ๋ฃน๋‹จ์œ„๋กœ ํ•ฉ์‚ฐํ•˜์—ฌ ํ•ฉ๊ณ„
์ฃผ๊ฐ„๋งค์ถœ_๋งค์ถœ๋Ÿ‰=๋งค์ถœ๋ฐ์ดํ„ฐ.fillna(0) \
  .groupby('๋‚ ์งœ', as_index=False)[['๋งค์ถœ๋Ÿ‰']].sum() \
  .sort_values(by='๋‚ ์งœ', ascending=True)
  
์ฃผ๊ฐ„๋งค์ถœ_ํ‰๊ท ๊ฐ€=๋งค์ถœ๋ฐ์ดํ„ฐ.fillna(0) \
  .groupby('๋‚ ์งœ', as_index=False)[['ํ‰๊ท ๊ฐ€๊ฒฉ']].mean() \
  .sort_values(by='๋‚ ์งœ', ascending=True)

 

  • 2๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•˜๋‚˜๋กœ merge  (on์— ๊ธฐ์žฌ๋œ '๋‚ ์งœ'๋ฅผ ๊ธฐ์ค€)
์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐ=pd.merge(์ฃผ๊ฐ„๋งค์ถœ_๋งค์ถœ๋Ÿ‰, ์ฃผ๊ฐ„๋งค์ถœ_ํ‰๊ท ๊ฐ€, on='๋‚ ์งœ')
์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐํ›ˆ๋ จ_๋„˜ํŒŒ์ด = ์ฃผ๊ฐ„๋งค์ถœ๋ฐ์ดํ„ฐ[['๋‚ ์งœ(์‹œ๊ฐ„๊ฐ’)', '์—ฐ๋„', '์›”', '์ผ', '์ฃผ', 'ํ‰๊ท ๊ฐ€๊ฒฉ']].to_numpy()

 

 

๐Ÿž ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ๋ง

์Šค์ผ€์ผํ™”๋Š” '๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •ํ™”'

StandardScaler  ์‚ฌ์šฉ

from sklearn.preprocessing import StandardScaler

์Šค์ผ€์ผ๋Ÿฌ = StandardScaler()
์Šค์ผ€์ผ๋Ÿฌ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ)
ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ = ์Šค์ผ€์ผ๋Ÿฌ.transform(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ)
ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ = ์Šค์ผ€์ผ๋Ÿฌ.transform(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ)

 

๐Ÿž ๋ฐ์ดํ„ฐ ์Šค์ฝ”์–ด

from sklearn.linear_model import LinearRegression

์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ = LinearRegression()
์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ.fit(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ)

 

  • ํ›ˆ๋ จ๊ณผ์ •์— ๋Œ€ํ•œ ์ฒ™๋„๋ฅผ ํ‰๊ฐ€ -> score() 
print("ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
print(์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ.score(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ, ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ))

print("ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
print(์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ.score(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ, ํ…Œ์ŠคํŠธ๋ชฉํ‘œ))

 

  • ์Šค์ผ€์ผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก๊ฒฐ๊ณผ
ํ›ˆ๋ จ์šฉ๋ชฉํ‘œ์˜ˆ์ธก = ์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ.predict(ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ)
ํ…Œ์ŠคํŠธ๋ชฉํ‘œ์˜ˆ์ธก = ์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ.predict(ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ_์Šค์ผ€์ผ)

 

๐Ÿž ์ฝ”๋“œ ๊ตฌํ˜„

  • index.html
<html> 
    <head> 
      <link rel="stylesheet" 
        href="https://pyscript.net/alpha/pyscript.css" /> 
      <script defer 
        src="https://pyscript.net/alpha/pyscript.js"></script> 

<py-env>
  - pandas
  - matplotlib
  - seaborn
  - scikit-learn
  - paths :
    - ./common.py
</py-env>
    </head>
  <body> 
    <link rel="stylesheet" href="pytable.css"/>
    <py-script>
    import pandas as pd
    from pyodide.http import open_url
    from common import *
    import numpy as np

    from datetime import datetime

    <!-- ๊ฒฝ๊ณ  ๋ฌธ๊ตฌ ์ œ๊ฑฐ -->
    import warnings
    warnings.filterwarnings( 'ignore' )

    <!-- ํŒ๋‹ค์Šค์—์„œ csv ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ์–ด์˜ด -->
    SalesData = pd.read_csv(open_url(
      "http://dreamplan7.cafe24.com/pyscript/csv/avocado.csv"
    ))      

    <!-- # 3๊ฐœ ํ•„๋“œ๋งŒ ์ถ”๋ ค์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋‹ค์‹œ ๋งŒ๋“ฌ -->
    SalesData = SalesData[[
      'Date', 
      'Total Volume',
      'AveragePrice'
    ]]   

    SalesData.columns = [
      'Day', 
      'Amount',
      'AveragePrice'
    ]

    <!-- ๋‚ ์งœ๋ณ„๋กœ ( ์ฃผ ๋‹จ์œ„๋กœ ) ๊ทธ๋ฃน์„ ์ง€์„ ๋•Œ๋„ ๋งค์ถœ๋Ÿ‰์€ ๊ทธ๋ฃน๋‹จ์œ„๋กœ ํ•ฉ์‚ฐํ•˜์—ฌ ํ•ฉ๊ณ„ -->
    WeekdaysSales_sum = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['Amount']].sum() \
    .sort_values(by='Day', ascending=True)
    
    WeekdaysSales_mean = SalesData.fillna(0) \
    .groupby('Day', as_index=False)[['AveragePrice']].mean() \
    .sort_values(by='Day', ascending=True)

    <!-- 2๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•˜๋‚˜๋กœ merge  (on์— ๊ธฐ์žฌ๋œ '๋‚ ์งœ'๋ฅผ ๊ธฐ์ค€) -->
    WeekdaysSalesData = pd.merge(WeekdaysSales_sum, WeekdaysSales_mean, on = 'Day')


    <!-- ๋‚ ์งœ(์‹œ๊ฐ„๊ฐ’) ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(1, 'Day(timeValue)',
        '',   True)
  
    for i in WeekdaysSalesData['Day'].index:
      WeekdaysSalesData['Day(timeValue)'].loc[i]=time.mktime(
      datetime.strptime(
        WeekdaysSalesData['Day'].loc[i], 
        '%Y-%m-%d'
        ).timetuple()
      )

    <!-- 10000์œผ๋กœ ๋‚˜๋ˆˆ ๋งค์ถœ๋Ÿ‰ ํ•„๋“œ ์ถ”๊ฐ€ -->
    WeekdaysSalesData.insert(3, 'Amount(10000)', 
    WeekdaysSalesData['Amount']/10000, 
      True)

    <!-- ํ›ˆ๋ จํ•™์Šต์šฉ์œผ๋กœ ๋‚ ์งœ๋ฅผ ์—ฐ๋„, ์›”, ์ผ๋กœ ๋‚˜๋ˆˆ๋‹ค -->
    WeekdaysSalesData.insert(4, 'year', '', True)
    WeekdaysSalesData.insert(5, 'month', '', True)
    WeekdaysSalesData.insert(6, 'day', '', True)
    WeekdaysSalesData.insert(7, 'week', '', True)

    for i in WeekdaysSalesData['Day'].index:
      temp = str(WeekdaysSalesData['Day'].loc[i]).split('-')
      year = int(temp[0])
      month = int(temp[1])
      day = int(temp[2])
      WeekdaysSalesData['year'].loc[i] = year
      WeekdaysSalesData['month'].loc[i] = month
      WeekdaysSalesData['day'].loc[i] = day
      WeekdaysSalesData['week'].loc[i] = str(
        datetime(year, month, day).isocalendar()[1]
      )

    createElementDiv(
      document, 
      Element, 
      'output2'
    ).write(WeekdaysSalesData)

    WeekdaysSalesDataTrain_numpy = WeekdaysSalesData[['Day(timeValue)', 'year', 'month', 'day', 'week', 'AveragePrice']].to_numpy()
    WeekdaysSalesDataTest_numpy = WeekdaysSalesData['Amount(10000)'].to_numpy()

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = \
      train_test_split(
        WeekdaysSalesDataTrain_numpy, 
        WeekdaysSalesDataTest_numpy,
        random_state=100,
        shuffle=False)

    <!-- ์Šค์ผ€์ผํ™”๋Š” '๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •ํ™”' -->
    from sklearn.preprocessing import StandardScaler

    sclar = StandardScaler()
    sclar.fit(X_train)
    X_train_scalr = sclar.transform(X_train)
    X_test_scalr = sclar.transform(X_test)

    <!-- ์„ ํ˜• ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ -->
    <!-- ํ›ˆ๋ จ, ์ตœ์ ์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ์ฐพ์•„์ค€๋‹ค -->
    from sklearn.linear_model import LinearRegression
    lr = LinearRegression()
    lr.fit(X_train_scalr, y_train)

    <!-- ์ข…๋ฅ˜๊ฐ€ ๋ชฉํ‘œ๊ฐ€ ์•„๋‹Œ ์ด์ƒ ์ •ํ™•๋„๋Š” ์ธก์ • ๋ถˆ๊ฐ€ -->
    <!-- ํ›ˆ๋ จ๊ณผ์ •์— ๋Œ€ํ•œ ์ฒ™๋„๋ฅผ ํ‰๊ฐ€ -> score()  -->
    print("ํ›ˆ๋ จ์šฉ๋ชจ๋ธ ์ •ํ™•๋„")
    print(lr.score(X_train_scalr, y_train))
    print("ํ…Œ์ŠคํŠธ๋ชจ๋ธ ์ •ํ™•๋„")
    print(lr.score(X_test_scalr, y_test))

    <!-- ์Šค์ผ€์ผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก๊ฒฐ๊ณผ -->
    y_train_predict = lr.predict(X_train_scalr)
    y_test_predict = lr.predict(X_test_scalr)

    import matplotlib.pyplot as plt
    import matplotlib as mat

    <!-- ๊ทธ๋ž˜ํ”„ -->
    fig = plt.figure(
      figsize=(15, 7)
    )

    plt.xticks(WeekdaysSalesData['Day(timeValue)'].to_numpy(), WeekdaysSalesData[['Day']].to_numpy()[:,0], rotation=90)

    plt.title('Weekdays Avocado SalesAmount')

    plt.plot(        
        X_train[:,0],
        y_train,
        marker='o',
        color='#c14549',
        label='Original'
    )
    plt.plot(        
        X_train[:,0],
        y_train_predict,
        marker='d',
        color='blue',
        label='Train pattern'
    )

    plt.plot(        
        X_test[:, 0],
        y_test,
        marker='o',
        color='#c14549'
    )

    plt.plot(        
        X_test[:, 0],
        y_test_predict,
        marker='d',
        color='green',
        label='Predict pattern'
    )

    plt.xlabel('Day')
    plt.ylabel('Day(timeValue)')

    plt.legend(
      shadow=True
    )

    ax = plt.gca()
    <!-- ์ถ•๋งŒ ๊ทธ๋ฆฌ๋“œ -->
    ax.xaxis.grid(True)

    <!-- ๋ฐฐ๊ฒฝ์ƒ‰, ๋งˆ์ง„ ์กฐ์ • -->
    ax.set_facecolor('#e8e7d2')
    ax.margins(x=0.01, y=0.02)

    <!-- ์ฃผ์œ„ ์ด์ƒํ•œ ์—ฌ๋ฐฑ ์—†์• ๊ธฐ -->
    fig.tight_layout() 
    fig
</py-script> 
  </body> 
</html>

 

  • common.py
def createElementDiv(document, Element, name):
    element = document.createElement('div')
    element.id = name
    document.body.append(element)
    return Element(name)

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments