π 곡λΆνλ μ§μ§μνμΉ΄λ μ²μμ΄μ§?
[Kaggle]MBTI_Myers-Briggs Personality Type Dataset(μ±κ²©μ°κ΅¬) λ³Έλ¬Έ
[Kaggle]MBTI_Myers-Briggs Personality Type Dataset(μ±κ²©μ°κ΅¬)
μ§μ§μνμΉ΄ 2022. 1. 22. 22:13220122 μμ±
<λ³Έ λΈλ‘κ·Έλ Kaggle μ μ°Έκ³ ν΄μ μ λ§μ νμ΄λ₯Ό μμ±νμμ΅λλ€>
https://www.kaggle.com/laowingkin/mbti-study-personality
MBTI - Study personality
Explore and run machine learning code with Kaggle Notebooks | Using data from (MBTI) Myers-Briggs Personality Type Dataset
www.kaggle.com
μκΆκΈνκ² μ°λ§,,, γ
μ°λμ MBTI λ ENFJ!!
μ§μ§μνμΉ΄μ MBTI λ INSF!!
1. MBTI λ°μ΄ν°
: MBTI λ 4κ°μ§ μ£Όμ μ¬λ¦¬μ κΈ°λ₯μΈ κ°κ°, μ§κ΄, λλ λ° μ¬κ³ λ₯Ό μ¬μ©νμ¬ μΈκ°μ΄ κ²½ννλ μΌμ’ μ μ¬λ¦¬νμ λΆλ₯
- λ΄ν₯μ±(I) – μΈν₯μ±(E)
- μ§κ΄(N) – κ°κ°(S)
- μκ°(T) – λλ(F)
- νλ¨(J) – μΈμ(P)
: λ°μ΄ν° μΈνΈμλ 8600κ° μ΄μμ λ°μ΄ν° νμ΄ ν¬ν¨λμ΄ μμΌλ©° κ° νμλ λ€μμ΄ ν¬ν¨
- μ ν (μ΄ μ¬λ 4κΈμ MBTI μ½λ/μ ν)
- μ΅κ·Όμ κ²μν 50κ° νλͺ©μ μΉμ (κ° νλͺ©μ "|||"(νμ΄ν λ¬Έμ 3κ°)λ‘ κ΅¬λΆλ¨)
: κΈ°λ³Έ μ©λ
- κΈ°κ³ νμ΅μ μ¬μ©νμ¬ MBTIμ μ ν¨μ±κ³Ό μ¨λΌμΈμμ μΈμ΄ μ€νμΌ λ° νλμ μμΈ‘νλ λ₯λ ₯μ νκ°
- κ·Έλ€μ΄ μμ±ν μΌλΆ ν μ€νΈλ₯Ό κΈ°λ°μΌλ‘ μ¬λμ μ±κ²© μ νμ κ²°μ νλ €κ³ μλν μ μλ κΈ°κ³ νμ΅ μκ³ λ¦¬μ¦μ μμ°
2. λ°μ΄ν° λΆμ
- import
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
- read_csv
df = pd.read_csv("/content/drive/MyDrive/></2022/MBTI/mbti_1.csv")
df.head()
- info()
df.info()
- value_counts()
df['type'].value_counts()
- "|||" μΌλ‘ κ΅¬λΆ λμ΄ μλ κ±Έ split!
- comment λ§λ€μ words μ, words μ κ°μλ₯Ό λΆμ°μΌλ‘ γ± γ±
def var_row(row) :
l = []
for i in row.split("|||") : # 50κ° νλͺ©μ μΉμ
(κ° νλͺ©μ "|||"(νμ΄ν λ¬Έμ 3κ°)λ‘ κ΅¬λΆλ¨)
l.append(len(i.split()))
return np.var(l)
# numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>)[source]
# μ§μ λ μΆμ λ°λΌ λΆμ°μ κ³μ°
# μ΄ μμ±
df['words_per_comment'] = df['posts'].apply(lambda x : len(x.split())/50) # μ 50μΌλ‘ λλκΉ μ μ μ ν¬μ€νΈκ° 50κ°!
df['variance_of_word_counts'] = df['posts'].apply(lambda x : var_row(x)) # λΆμ°!
df.head()
+) λΆμ°μ! μ΄μΌ ꡬνλ!
numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>)[source]
: κ΄μΈ‘κ°μμ νκ· μ λΊ κ°μ μ κ³±νκ³ , κ·Έκ²μ λͺ¨λ λν ν μ 체 κ°μλ‘ λλ μ ꡬνλ€
: μ¦, μ°¨μ΄κ°μ μ κ³±μ νκ·
- sns μ μ¬μ©νμ¬
plt.figure(figsize = (15,10))
sns.swarmplot("type", "words_per_comment", data = df)
- groupby μ agg ν©ν΄μ count μ μ©
# dfμμ type λ₯Ό κ·Έλ£Ή(λ¬Άκ³ )λ§λ€κ³ , aggν΄μ type μ count ꡬνκΈ°
df.groupby("type").agg({"type" : "count"})
- μ΄ν΄κ° μλ¨. μ μ λ€κ°λ₯Ό λΉΌλκ±°μ§ ?
# μ μ λ€κ°μ MBTI λ λΊμκΉ??
# isin() λ©μλ μμ κ°μ΄ λ€μ΄ μμΌλ©΄ True, μλλ©΄ False λ°ν
df_2 = df[~df['type'].isin(['ESFJ', 'ESFP', 'ESTJ', 'ESTP'])]
df_2['http_per_comment'] = df_2['posts'].apply(lambda x : x.count("http")/50)
df_2['qm_per_comment'] = df_2['posts'].apply(lambda x : x.count("?")/50)
df_2.head()
: 4κ°μ MBTIλ₯Ό λΊ df_2 μμ http λ ? κ° λ€μ΄κ° κ²μ κ°μλ₯Ό 50μΌλ‘ λλ~
- νκ· λ λ§λ€κ³
df_2.groupby("type").agg({"http_per_comment" : "mean"})
df_2.groupby("type").agg({"qm_per_comment" : "mean"})
- MBTI λ§λ€ μ΄λ³λ λ° μΌλ³λ κ·Έλνλ₯Ό μ¬μ©νμ¬ λ λ³μμ νλ‘― μ κ·Έλ¦Ό
https://seaborn.pydata.org/generated/seaborn.jointplot.html μ°Έκ³
def plot_jointplot(mbti_type, axs, titles) :
df_3 = df_2[df_2["type"] == mbti_type]
sns.jointplot("variance_of_word_counts", "words_per_comment", data = df_3, kind = "hex", ax = axs, title = titles)
i = df_2["type"].unique()
k = 0
for m in range(0, 2) :
for n in range(0, 6) :
df_3 = df_2[df_2['type'] == i[k]]
sns.jointplot("variance_of_word_counts", "words_per_comment", data = df_3, kind = "hex")
plt.title(i[k])
k += 1
: μ΄λ°κ² 12κ° νΌμ³μ§
: νλμ λΉκ΅λ μ½μ
- ν΅μ¬λ¨μ΄ μκ°ν
from scipy.misc import imread
from wordcloud import WordCloud, STOPWORDS
# μλ ν΄λΌμ°λ (wordcloud) : νΉμ λ°μ΄ν°λ ν
μ€νΈμ μμ£Ό λ±μ₯νλ ν΅μ¬λ¨μ΄ μκ°ν
fig, ax = plt.subplots(len(df['type'].unique()), sharex = True, figsize = (155, 10*len(df["type"].unique())))
k = 0
for i in df['type'].unique() :
df_4 = df[df["type"] == i ]
wordcloud = WordCloud().generate(df_4['posts'].to_string())
ax[k].imshow(wordcloud)
ax[k].set_title(i)
ax[k].axis("off")
k += 1
.. ν μ§μ§μ΄λ YOUTUBE λ₯Ό μ’μνλλ΄₯
'π©βπ» μ»΄ν¨ν° ꡬ쑰 > Kaggle' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[Kaggle]Super Image Resolution_κ³ νμ§ μ΄λ―Έμ§ λ§λ€κΈ° (0) | 2022.02.07 |
---|---|
[Kaggle] CNN Architectures (0) | 2022.02.04 |
[Kaggle] HeartAttack μμΈ‘ (0) | 2022.01.31 |
[Kaggle] Chest X-Ray νμ μ΄λ―Έμ§ λΆλ₯νκΈ° (0) | 2022.01.29 |
[Kaggle]Breast Cancer Wisconsin (Diagnostic) Data Set_μ λ°©μ λΆλ₯ (0) | 2022.01.28 |