๐Ÿ˜Ž ๊ณต๋ถ€ํ•˜๋Š” ์ง•์ง•์•ŒํŒŒ์นด๋Š” ์ฒ˜์Œ์ด์ง€?

FCC STFT MFCC ๋ฅผ ์ด์šฉํ•œ ์Œ์„ฑ ์‹ ํ˜ธ ๋ถ„์„ ๋ณธ๋ฌธ

๐Ÿ‘ฉ‍๐Ÿ’ป ์ธ๊ณต์ง€๋Šฅ (ML & DL)/AI

FCC STFT MFCC ๋ฅผ ์ด์šฉํ•œ ์Œ์„ฑ ์‹ ํ˜ธ ๋ถ„์„

์ง•์ง•์•ŒํŒŒ์นด 2022. 11. 3. 15:43
728x90
๋ฐ˜์‘ํ˜•

< ๋ณธ ๋ธ”๋กœ๊ทธ๋Š” hyunlee103๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค :-) >

https://hyunlee103.tistory.com/36

 

[Sound AI #11] ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ (Python Coding)

Sound of AI ์œ ํŠœ๋ธŒ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ฃฌ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ํŒŒ์ด์ฌ์œผ๋กœ ๊ตฌํ˜„ํ•ด๋ณด๋ ค ํ•œ๋‹ค. ์šฐ์„  ์šฐ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ import ํ•˜์ž import numpy as np import lib

hyunlee103.tistory.com

Colab์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค

 

!pip install pydub
!apt install ffmpeg

 

๐Ÿ library & data load

from os import path
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
file = 'voice.m4a'
sound = AudioSegment.from_file(file)
print(type(sound))

 

sig, sr = librosa.load(file, sr=None)
  • y : ์Œ์›์˜ ํŒŒํ˜• ๋ฐ์ดํ„ฐ
  • sr : sampling rate ์ดˆ๋‹น ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜ (์ฃผํŒŒ์ˆ˜ ๋ถ„์„ ๋ฐ ํŒŒํ˜•์˜ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๊ฒฐ์ •)
y = np.array(sound.get_array_of_samples())

plt.plot(np.arange(0, len(sig)), sig)

โž• ๋‹จ์ˆœ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜ → Spectrum

fft = np.fft.fft(sig)

# ๋ณต์†Œ๊ณต๊ฐ„ ๊ฐ’ ์ ˆ๋Œ“๊ฐ‘ ์ทจํ•ด์„œ, magnitude ๊ตฌํ•˜๊ธฐ
magnitude = np.abs(fft) 

# Frequency ๊ฐ’ ๋งŒ๋“ค๊ธฐ
f = np.linspace(0,sr,len(magnitude))

# ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์„ ํ†ต๊ณผํ•œ specturm์€ ๋Œ€์นญ๊ตฌ์กฐ๋กœ ๋‚˜์™€์„œ high frequency ๋ถ€๋ถ„ ์ ˆ๋ฐ˜์„ ๋‚ ๋ ค๊ณ  ์•ž์ชฝ ์ ˆ๋ฐ˜๋งŒ ์‚ฌ์šฉํ•œ๋‹ค.
left_spectrum = magnitude[:int(len(magnitude)/2)]
left_f = f[:int(len(magnitude)/2)]

plt.figure()
plt.plot(left_f, left_spectrum)
plt.xlabel("Frequency")
plt.ylabel("Magnitude")
plt.title("Power spectrum")

 

 

 

โž• STFT (Short Time Fourier Transform) -> Spectrogram

  • ๋‹จ์ˆœ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜๊ณผ ๋‹ค๋ฅด๊ฒŒ ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด, frame ๋‹จ์œ„๋กœ FFT๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ frame ์ˆ˜์™€ frame ๋‹น sample ์ˆ˜๋ฅผ ์ง€์ •
  • Time ๋„๋ฉ”์ธ์˜ ํŒŒํ˜•์„ Frequency ๋„๋ฉ”์ธ์œผ๋กœ ๋ณ€ํ˜•์‹œํ‚ค๋Š” ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜
  • ์ „์ฒด ํŒŒํ˜•์„ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋ฉด ์ œ๋Œ€๋กœ ๋œ ์ฃผํŒŒ์ˆ˜ ๋ถ„์„์„ ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, ์งง์€ ์‹œ๊ฐ„ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌํ•ด์„œ ๊ฐ๊ฐ์˜ ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด ๋ณ€ํ™˜
  • Spectrogram์—์„œ๋Š” dB ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์˜ ๊ฒฐ๊ณผ์ธ magnitude์— ๋กœ๊ทธ Scaling์„ ํ†ตํ•ด dB๋กœ ๋ณ€ํ™˜ 
    •  
    • win_length : FFT(Fast Fourier Transform์˜ ์•ฝ์ž, ๋น ๋ฅด๊ฒŒ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ์‹)๋ฅผ ํ•  ๋•Œ ์ฐธ์กฐํ•  ๊ทธ๋ž˜ํ”„์˜ ๊ธธ์ด
    • hop_length : ์–ผ๋งˆ๋งŒํผ ์‹œ๊ฐ„ ์ฃผ๊ธฐ๋ฅผ ์ด๋™ํ•˜๋ฉด์„œ ๋ถ„์„์„ ํ•  ๊ฒƒ์ธ์ง€ (์นผ๋ผ๋งต์˜ ์‹œ๊ฐ„ ์ฃผ๊ธฐ)
       
    • n_fft : win_length๋ณด๋‹ค ๊ธธ ๊ฒฝ์šฐ ๋ชจ๋‘ zero paddingํ•ด์„œ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•จ (default๋Š” win_length์™€ ๊ฐ™์Œ)
# STFT -> spectrogram
hop_length = 512  # ์ „์ฒด frame ์ˆ˜
n_fft = 2048  # frame ํ•˜๋‚˜๋‹น sample ์ˆ˜

# calculate duration hop length and window in seconds
hop_length_duration = float(hop_length)/sr
n_fft_duration = float(n_fft)/sr

# STFT
stft = librosa.stft(sig, n_fft=n_fft, hop_length=hop_length)

# ๋ณต์†Œ๊ณต๊ฐ„ ๊ฐ’ ์ ˆ๋Œ“๊ฐ’ ์ทจํ•˜๊ธฐ
magnitude = np.abs(stft)

# magnitude > Decibels 
log_spectrogram = librosa.amplitude_to_db(magnitude)

# display spectrogram
plt.figure()
librosa.display.specshow(log_spectrogram, sr=sr, hop_length=hop_length)
plt.xlabel("Time")
plt.ylabel("Frequency")
plt.colorbar(format="%+2.0f dB")
plt.title("Spectrogram (dB)")

 

โž• MFCC

# MFCCs
# extract 13 MFCCs
MFCCs = librosa.feature.mfcc(sig, sr, n_fft=n_fft, hop_length=hop_length, n_mfcc=13)

# display MFCCs
plt.figure()
librosa.display.specshow(MFCCs, sr=sr, hop_length=hop_length)
plt.xlabel("Time")
plt.ylabel("MFCC coefficients")
plt.colorbar()
plt.title("MFCCs")

# show plots
plt.show()

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•
Comments