๐ ๊ณต๋ถํ๋ ์ง์ง์ํ์นด๋ ์ฒ์์ด์ง?
[์์ฑ] ์์ฑ์ธ์ ์์ ๋ถ๋ฅ & ์ถ์ฒ ์๊ณ ๋ฆฌ์ฆ ๋ณธ๋ฌธ
[์์ฑ] ์์ฑ์ธ์ ์์ ๋ถ๋ฅ & ์ถ์ฒ ์๊ณ ๋ฆฌ์ฆ
์ง์ง์ํ์นด 2022. 1. 31. 13:59220131 ์์ฑ
<๋ณธ ๋ธ๋ก๊ทธ๋ ์กฐ๋ ์ฝ๋ฉ์ผ๊ธฐ๋ฅผ ์ฐธ๊ณ ํด์ ๊ณต๋ถํ๋ฉฐ ์์ฑํ์์ต๋๋ค>
https://jonhyuk0922.tistory.com/114
[Librosa] ์์ฑ์ธ์ ๊ธฐ์ด ๋ฐ ์์ ๋ถ๋ฅ & ์ถ์ฒ ์๊ณ ๋ฆฌ์ฆ
์๋ ํ์ธ์~ 27๋ ์ฐจ ์ง๋กํ์๊พผ ์กฐ๋ ์ ๋๋ค!! ์ค๋์ ์์ฑํ์ผ์ ์ธ์ํ๊ณ ๊ฑฐ๊ธฐ์ ํน์ง์ถ์ถํ๋ ๊ธฐ์ด์ ์ธ ๋ด์ฉ๋ถํฐ ์ถ์ถํ ํน์ง๋ค์ ํตํด ๋ ธ๋์ ์ฅ๋ฅด๋ฅผ ๋ถ๋ฅํ๋ ๋ชจ๋ธ๊ณผ ๋น์ทํ ์ฅ๋ฅด์ ๋ ธ๋๋ฅผ
jonhyuk0922.tistory.com
https://newsight.tistory.com/294
์์ฑ์ธ์ ๊ธฐ์ด ์ดํดํ๊ธฐ
# ๋ฐ์๊ธฐํธ์ ๋ฌธ์ํํ - phoneme: ์์, ๊ฐ์ฅ ์์ ์๋ฆฌ์ ๋จ์. ์ฝ๊ฒ ๋งํด ์์ด์ฌ์ ์ ๋ฐ์๊ธฐํธ๋ฅผ ์๊ฐํ๋ฉด ๋๋ค. - grapheme: ์์(=๋ฌธ์์), ๊ฐ์ฅ ์์ ๋ฌธ์์ ๋จ์. ๋ฐ์๊ธฐํธ๋ก ํํ๋๊ธฐ ์ด์ ์ ์
newsight.tistory.com
1. ์๋ฆฌ
: ๊ณต๊ธฐ์ ์ฃผ๊ธฐ์ ์ธ ์ง๋(์๋ค๋ก์ ์์ถ/ํฝ์ฐฝ)์ด ๋ฐ์ํด์ผ๋ง ์๋ฆฌ
: ๊ณต๊ธฐ๋ฅผ ๋งค์ง๋กํ๋ ํ๋ ์๋์ง๊ฐ ๋ฐ๋ก ์๋ฆฌ์ ์ ์ฒด
: ํ๋์ ์ง๋์๊ฐ ์์ผ๋ฉด ๋ฎ์ ์๋์ง๋ฅผ ๊ฐ์ง ์ ์
: ์ง๋์๊ฐ ํฌ๋ฉด ๋ง์ ์๋์ง๋ฅผ ๊ฐ์ง ๊ณ ์
2. Wav ํ์ผ
: ์๋ฆฌ ํ๋์ ๋์ด ๊ฐ(floatํ)์ ์ผ์ ํ ์๊ฐ ๊ฐ๊ฒฉ(sampling rate)๋ง๋ค ๊ธฐ๋กํ ์ค์์ ๋ฐฐ์ด
: ์ ์ฅ ๋ฐฉ์์ PCM(Pulse-Code Modulation)
: ์ซ์๋ก ์ด๋ฃจ์ด์ง
- y : ์๋ฆฌ๊ฐ ๋จ๋ฆฌ๋ ์ธ๊ธฐ(์งํญ)๋ฅผ ์๊ฐ ์์๋๋ก ๋์ดํ ๊ฒ
- Sampling rate: 1์ด๋น ์ํ์ ๊ฐ์, ๋จ์ 1์ด๋น Hz ๋๋ kHz
- Mono vs Stereo
: Left-Right ๋ง์ดํฌ 2๊ฐ์์ ๋ค์ด์จ ์๋ฆฌ(2๊ฐ์ mono)๋ฅผ ๋ น์ํด์ ํ ๋ฒ์ ์ ์ฅํ ๊ฒ์ ์คํ ๋ ์ค
- Sampling rate
: data point์ x์ถ ํด์๋
: 1์ด์ ๋ช๋ฒ์ด๋ data point๋ฅผ ์ฐ์์ง์ด๋ค.
- Bit depth
: data point์ y์ถ ํด์๋
: ๊ฐ ์ ๋ค์ ๋์ด(amplitude)๋ฅผ ๊ตฌ๋ถํ ์ ์๋ ํด์๋
- Bit rate
: Sampling rate * Bit depth
: ์ด๋น bits์ ์ก๋
3. ์๋ฆฌ ํ์ผ ๋ถ์ ( with kaggle )
- 1) ๋ฐ์ดํฐ์ ๋ก๋
import librosa
# librosa.load() : ์ค๋์ค ํ์ผ์ ๋ก๋
y , sr = librosa.load('Data/genres_original/reggae/reggae.00036.wav')
print(y)
print(len(y))
print(y.shape)
print('Sampling rate (Hz): %d' %sr)
print('Audio length (seconds): %.2f' % (len(y) / sr))
#์์
์ ๊ธธ์ด(์ด) = ์ํ์ ๊ธธ์ด/Sampling rate
- 2) ์์ ๋ค์ด๋ณด๊ธฐ
import IPython.display as ipd
ipd.Audio(y, rate=sr)
- VS code ์์๋ ์๋ค๋ฆฐ๋น..
- colab ์ผ๋ก ๋๋ฆฌ๋๊น ๋์ธใ ๋จ!
- 3) ์์ ๊ทธ๋ํ
- 2D ๊ทธ๋ํ
import matplotlib.pyplot as plt
import librosa.display
plt.figure(figsize =(16,6))
librosa.display.waveplot(y=y,sr=sr)
plt.show()
- Fourier Transform(ํธ๋ฆฌ์ ๋ณํ)
: ์๊ฐ ์์ญ ๋ฐ์ดํฐ๋ฅผ ์ฃผํ์ ์์ญ์ผ๋ก ๋ณ๊ฒฝ
: time(์๊ฐ) domain -> frequency(์ง๋์)
- y์ถ : ์ฃผํ์(๋ก๊ทธ ์ค์ผ์ผ)
- color์ถ : ๋ฐ์๋ฒจ(์งํญ)
import numpy as np
# n_fft : window size
# ์์ฑ์ ๊ธธ์ด๋ฅผ ์ผ๋ง๋งํผ์ผ๋ก ์๋ฅผ ๊ฒ์ธ๊ฐ? => window
Fourier = np.abs(librosa.stft(y, n_fft=2048, hop_length=512))
print(Fourier.shape)
plt.figure(figsize=(16,6))
plt.plot(Fourier)
plt.show()
- Spectogram
: ์๊ฐ์ ๋ฐ๋ฅธ ์ ํธ ์ฃผํ์์ ์คํํธ๋ผ ๊ทธ๋ํ
# amplitude(์งํญ) -> DB(๋ฐ์๋ฒจ)๋ก ๋ฐ๊ฟ๋ผ
DB = librosa.amplitude_to_db(Fourier, ref=np.max)
plt.figure(figsize=(16,6))
librosa.display.specshow(DB,sr=sr, hop_length=512, x_axis='time', y_axis='log')
plt.colorbar()
plt.show()
- Mel Spectogram
: Spectogram์ y์ถ์ Mel Scale๋ก ๋ณํํ ๊ฒ
Mel = librosa.feature.melspectrogram(y, sr=sr)
Mel_DB = librosa.amplitude_to_db(Mel, ref=np.max)
plt.figure(figsize=(16,6))
librosa.display.specshow(Mel_DB, sr=sr,hop_length=512, x_axis='time',y_axis='log')
plt.colorbar()
plt.show()
- 4) ๋ ๊ฒ VS ํด๋์ ๋น๊ต
y, sr = librosa.load('Data/genres_original/classical/classical.00036.wav')
y, _ = librosa.effects.trim(y)
S = librosa.feature.melspectrogram(y, sr=sr)
S_DB = librosa.amplitude_to_db(S, ref=np.max)
plt.figure(figsize=(16,6))
librosa.display.specshow(S_DB, sr=sr,hop_length=512, x_axis='time',y_axis='log')
plt.colorbar()
plt.show()
- 5) ์ค๋์ค ํน์ฑ ์ถ์ถ(Audio Feature Extraction)
- Tempo(BPM)
tempo , _ = librosa.beat.beat_track(y,sr=sr)
print(tempo)
- Zero Crossing Rate
: ์ํ๊ฐ ์์์ ์์ผ๋ก ๋๋ ์์์ ์์ผ๋ก ๋ฐ๋๋ ๋น์จ
# 0์ด ๋๋ ์ ์ ์ง๋์น ํ์
zero_crossings = librosa.zero_crossings(y, pad=False)
print(zero_crossings)
print(sum(zero_crossings)) # ์ <-> ์ ์ด๋ํ ํ์
: Zero Crossing์ 0์ด ๋๋ ์ ์ ์ง๋์น ํ์
n0 = 9000
n1 = 9080
plt.figure(figsize=(16,6))
plt.plot(y[n0:n1])
plt.grid()
plt.show()
: ๊ทธ๋ํ 0 ์ดํ๋ก ์ฐ๋๊ฑฐ ๋ณด๋ฉด ์ฝ 11 ๊ฐ?
#n0 ~ n1 ์ฌ์ด zero crossings
zero_crossings = librosa.zero_crossings(y[n0:n1], pad=False)
print(sum(zero_crossings))
- 6) ํน์ง ์ถ์ถ
- 1) Harmonic and Percussive Components
- Percussives: ๋ฆฌ๋ฌ๊ณผ ๊ฐ์ ์ ๋ํ๋ด๋ ์ถฉ๊ฒฉํ
- Harmonics : ์ฌ๋์ ๊ท๋ก ๊ตฌ๋ถํ ์ ์๋ ํน์ง๋ค(์์ ์ ์๊น)
y_harm, y_perc = librosa.effects.hpss(y)
plt.figure(figsize=(16,6))
plt.plot(y_harm, color='b')
plt.plot(y_perc, color='r')
plt.show()
- 2) Spectral Centroid
: ์๋ฆฌ๋ฅผ ์ฃผํ์ ํํํ์ ๋, ์ฃผํ์์ ๊ฐ์คํ๊ท ์ ๊ณ์ฐํ์ฌ ์๋ฆฌ์ "๋ฌด๊ฒ ์ค์ฌ"์ด ์ด๋์ง๋ฅผ ์๋ ค์ฃผ๋ ์งํ
spectral_centroids = librosa.feature.spectral_centroid(y, sr=sr)[0]
#Computing the time variable for visualization
frames = range(len(spectral_centroids))
# Converts frame counts to time (seconds)
t = librosa.frames_to_time(frames)
import sklearn
def normalize(x, axis=0):
# sk.minmax_scale() : ์ต๋ ์ต์๋ฅผ 0 ~ 1 ๋ก ๋ง์ถฐ์ค๋ค.
return sklearn.preprocessing.minmax_scale(x, axis=axis)
plt.figure(figsize=(16,6))
librosa.display.waveplot(y, sr=sr, alpha=0.5, color='b')
plt.plot(t, normalize(spectral_centroids), color='r')
plt.show()
- 3) Spectral Rolloff
: ์ด ์คํํธ๋ด ์๋์ง ์ค ๋ฎ์ ์ฃผํ์(85% ์ดํ)์ ์ผ๋ง๋ ๋ง์ด ์ง์ค๋์ด ์๋๊ฐ
: ์ ํธ ๋ชจ์์ ์ธก์
spectral_rolloff = librosa.feature.spectral_rolloff(y, sr=sr)[0]
plt.figure(figsize=(16,6))
librosa.display.waveplot(y,sr=sr,alpha=0.5,color='b')
plt.plot(t, normalize(spectral_rolloff),color='r')
plt.show()
- 4) Mel-Frequency Cepstral Coefficients(MFCCs)
: MFCCs๋ ํน์ง๋ค์ ์์ ์งํฉ(์ฝ 10-20)์ผ๋ก ์คํํธ๋ด ํฌ๊ณก์ ์ ์ ์ฒด์ ์ธ ๋ชจ์์ ์ถ์ฝ
: ์ฌ๋์ ์ฒญ๊ฐ ๊ตฌ์กฐ๋ฅผ ๋ฐ์ํ์ฌ ์์ฑ ์ ๋ณด ์ถ์ถ
1. ์ ์ฒด ์ค๋์ค ์ ํธ๋ฅผ ์ผ์ ๊ฐ๊ฒฉ์ผ๋ก ๋๋๊ณ ํธ๋ฆฌ์ ๋ณํ์ ๊ฑฐ์ณ ์คํํธ๋ก๊ทธ๋จ์ ๊ตฌํฉ๋๋ค.
2. ๊ฐ ์คํํธ๋ผ์ ์ ๊ณฑ์ธ ํ์ ์คํํธ๋ก๊ทธ๋จ์ Mel scale filter bank๋ฅผ ์ฌ์ฉํด ์ฐจ์ ์๋ฅผ ์ค์ ๋๋ค.
3. cepstral ๋ถ์์ ์ ์ฉํด MFCC๋ฅผ ๊ตฌํฉ๋๋ค.
์ถ์ฒ: https://tech.kakaoenterprise.com//66
mfccs = librosa.feature.mfcc(y, sr=sr)
mfccs = normalize(mfccs,axis=1)
print('mean : %.2f' % mfccs.mean())
print('var : %.2f' % mfccs.var())
plt.figure(figsize=(16,6))
librosa.display.specshow(mfccs,sr=sr, x_axis='time')
plt.show()
- Chroma Frequencies
: ํฌ๋ก๋ง๋ ์ธ๊ฐ ์ฒญ๊ฐ์ด ์ฅํ๋ธ ์ฐจ์ด๊ฐ ๋๋ ์ฃผํ์๋ฅผ ๊ฐ์ง ๋ ์์ ์ ์ฌ์์ผ๋ก ์ธ์ง
: ํฌ๋ก๋ง ํน์ง์ ์์ ์ ํฅ๋ฏธ๋กญ๊ณ ๊ฐ๋ ฌํ ํํ
chromagram = librosa.feature.chroma_stft(y, sr=sr, hop_length=512)
plt.figure(figsize=(16,6))
librosa.display.specshow(chromagram,x_axis='time', y_axis='chroma', hop_length=512)
plt.show()
4. ์์ ์ฅ๋ฅด ๋ถ๋ฅ ( with kaggle )
- 1) ๋ฐ์ดํฐ์ ๋ก๋
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid') # ๊ทธ๋ํ ํ
๋๋ฆฌ ๋ชจ๋ ์ ๊ฑฐ
df = pd.read_csv('Data/features_3_sec.csv')
df.head()
- 2) ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ
X = df.drop(columns=['filename','length','label']) # ํ์ ์๋๊ฒ!
y = df['label'] #์ฅ๋ฅด๋ช
scaler = MinMaxScaler() # scale 0~1 ์กฐ์
np_scaled = scaler.fit_transform(X)
X = pd.DataFrame(np_scaled, columns=X.columns)
X.head()
- 3) Train, Test ๋ถํ
X_train , X_test , y_train, y_test = train_test_split(X,y , test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
- 4) ๋ชจ๋ธ ๊ตฌ์ถ
- xgboost
d
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
xgb = XGBClassifier(n_estimators=100, learning_rate=0.05) #1000๊ฐ์ ๊ฐ์ง, 0.05 ํ์ต๋ฅ
xgb.fit(X_train, y_train) #ํ์ต
print("xgb Test Accuarcy : {}%".format(round(xgb.score(X_test, y_test) * 100, 2)))
- Logistic Regression
lr = LogisticRegression(solver = "lbfgs")
lr.fit(X_train, y_train)
print("lr Test Accuarcy : {}%".format(round(lr.score(X_test, y_test) * 100, 2)))
- RandomForest
rf = RandomForestClassifier(n_estimators = 100, random_state = 42)
rf.fit(X_train, y_train)
print("rf Test Accuarcy : {}%".format(round(rf.score(X_test, y_test) * 100, 2)))
- Decision Tree
tree = DecisionTreeClassifier()
params = {
'max_depth' : [6, 8, 10, 12, 16, 20, 24],
'min_samples_split' : [16, 24]
}
grid_dt = GridSearchCV(tree, param_grid=params, scoring='accuracy', cv=5, verbose=1)
grid_dt.fit(X_train, y_train)
print('์ต์์ ๊ต์ฐจ๊ฒ์ฆ ์ ํ๋ {:.2f}'.format(grid_dt.best_score_))
print("rf Test Accuarcy : {}%".format(round(grid_dt.score(X_test, y_test) * 100, 2)))
print('์ต์ ์ ๋งค๊ฐ๋ณ์ : {}'.format(grid_dt.best_params_))
- 5) Confusion Matrix
y_preds = rf.predict(X_test) #๊ฒ์ฆ
cm = confusion_matrix(y_test,y_preds)
plt.figure(figsize=(16,9))
sns.heatmap(
cm,
annot=True,
xticklabels=["blues","classical","country","disco","hiphop","jazz","metal","pop","reggae","rock"],
yticklabels=["blues","classical","country","disco","hiphop","jazz","metal","pop","reggae","rock"]
)
plt.show()
- 6) Confusion Matrix
for feature, importance in zip(X_test.columns, rf.feature_importances_):
print('%s: %.2f' % (feature, importance)) #์ด๋ค ํน์ง์ด ์ค์ํ๋์ง ๋ณด์ฌ์ค
5. ๋ ธ๋ ์ถ์ฒ
- 1) ๋ฐ์ดํฐ ๋ก๋
df_30 = pd.read_csv('Data/features_30_sec.csv', index_col='filename')
labels = df_30[['label']]
df_30 = df_30.drop(columns=['length','label'])
df_30_scaled = StandardScaler().fit_transform(df_30) #ํ๊ท 0 , ํ์คํธ์ฐจ 1
df_30 = pd.DataFrame(df_30_scaled, columns=df_30.columns)
df_30.head()
- 2) ์ ์ฌ๋ ์ค์
from sklearn.metrics.pairwise import cosine_similarity
# ๋ฒกํฐ์ ์ ์ฌ๋ , ์ฆ ๋ฒกํฐ๊ฐ์ ๊ฐ๋๋ฅผ ํตํด ์ถ์ cos0 =1 ์ด๋ฏ๋ก 1์ ๊ฐ๊น์ธ ์๋ก ์ ์ฌ
# cos180 = -1 ์ด๋ฏ๋ก -1์ ๊ฐ๊น์ธ ์๋ก ๋ค๋ฅด๋ค.
similarity = cosine_similarity(df_30)
sim_df = pd.DataFrame(similarity, index=labels.index, columns=labels.index)
sim_df.head()
- +) ํจ์ํ
def find_similar_songs(name, n=5):
series = sim_df[name].sort_values(ascending=False)
series = series.drop(name)
return series.head(n).to_frame()
find_similar_songs('rock.00000.wav')
'๐ฉโ๐ป ์ธ๊ณต์ง๋ฅ (ML & DL) > ML & DL' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Deep Learning]_3_๋ฏธ๋ถ ๊ธฐ์ด (2) (0) | 2022.02.01 |
---|---|
[Deep Learning]_2_๋ฏธ๋ถ ๊ธฐ์ด (1) (0) | 2022.02.01 |
[CNN]_Convolution ๊ณผ์ (0) | 2022.01.29 |
[DEEPNOID ์ํฌ์ธํธ๋ ์จ]_10_GAN (0) | 2022.01.28 |
[DEEPNOID ์ํฌ์ธํธ๋ ์จ]_9_AutoEncoder & GAN (0) | 2022.01.28 |