FCC STFT MFCC 를 이용한 음성 신호 분석

250x250

관리 메뉴

😎 공부하는 징징알파카는 처음이지?

FCC STFT MFCC 를 이용한 음성 신호 분석 본문

👩‍💻 인공지능 (ML & DL)/AI

FCC STFT MFCC 를 이용한 음성 신호 분석

징징알파카 2022. 11. 3. 15:43

728x90

< 본 블로그는 hyunlee103님의 블로그를 참고해서 공부하며 작성하였습니다 :-) >

https://hyunlee103.tistory.com/36

[Sound AI #11] 오디오 데이터 전처리 (Python Coding)

Sound of AI 유튜브를 바탕으로 작성되었습니다. 이전 포스팅에서 다룬 오디오 데이터 전처리를 파이썬으로 구현해보려 한다. 우선 우리가 사용할 라이브러리들을 import 하자 import numpy as np import lib

hyunlee103.tistory.com

Colab을 사용했습니다

!pip install pydub
!apt install ffmpeg

🍏 library & data load

from os import path
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display

file = 'voice.m4a'

sound = AudioSegment.from_file(file)
print(type(sound))

sig, sr = librosa.load(file, sr=None)

y : 음원의 파형 데이터
sr : sampling rate 초당 샘플 개수 (주파수 분석 및 파형의 시간 간격을 결정)

y = np.array(sound.get_array_of_samples())

plt.plot(np.arange(0, len(sig)), sig)

➕ 단순 푸리에 변환 → Spectrum

fft = np.fft.fft(sig)

# 복소공간 값 절댓갑 취해서, magnitude 구하기
magnitude = np.abs(fft) 

# Frequency 값 만들기
f = np.linspace(0,sr,len(magnitude))

# 푸리에 변환을 통과한 specturm은 대칭구조로 나와서 high frequency 부분 절반을 날려고 앞쪽 절반만 사용한다.
left_spectrum = magnitude[:int(len(magnitude)/2)]
left_f = f[:int(len(magnitude)/2)]

plt.figure()
plt.plot(left_f, left_spectrum)
plt.xlabel("Frequency")
plt.ylabel("Magnitude")
plt.title("Power spectrum")

➕ STFT (Short Time Fourier Transform) -> Spectrogram

단순 푸리에 변환과 다르게 시간 정보를 보존하기 위해, frame 단위로 FFT를 수행한다. 따라서 frame 수와 frame 당 sample 수를 지정
Time 도메인의 파형을 Frequency 도메인으로 변형시키는 푸리에 변환
전체 파형을 대상으로 하면 제대로 된 주파수 분석을 할 수 없기 때문에, 짧은 시간 단위로 분리해서 각각의 구간에 대해 변환
Spectrogram에서는 dB 값을 사용하므로, 푸리에 변환의 결과인 magnitude에 로그 Scaling을 통해 dB로 변환
- win_length : FFT(Fast Fourier Transform의 약자, 빠르게 변환하는 방식)를 할 때 참조할 그래프의 길이
- hop_length : 얼마만큼 시간 주기를 이동하면서 분석을 할 것인지 (칼라맵의 시간 주기)
- n_fft : win_length보다 길 경우 모두 zero padding해서 처리하기 위함 (default는 win_length와 같음)

# STFT -> spectrogram
hop_length = 512  # 전체 frame 수
n_fft = 2048  # frame 하나당 sample 수

# calculate duration hop length and window in seconds
hop_length_duration = float(hop_length)/sr
n_fft_duration = float(n_fft)/sr

# STFT
stft = librosa.stft(sig, n_fft=n_fft, hop_length=hop_length)

# 복소공간 값 절댓값 취하기
magnitude = np.abs(stft)

# magnitude > Decibels 
log_spectrogram = librosa.amplitude_to_db(magnitude)

# display spectrogram
plt.figure()
librosa.display.specshow(log_spectrogram, sr=sr, hop_length=hop_length)
plt.xlabel("Time")
plt.ylabel("Frequency")
plt.colorbar(format="%+2.0f dB")
plt.title("Spectrogram (dB)")

➕ MFCC

# MFCCs
# extract 13 MFCCs
MFCCs = librosa.feature.mfcc(sig, sr, n_fft=n_fft, hop_length=hop_length, n_mfcc=13)

# display MFCCs
plt.figure()
librosa.display.specshow(MFCCs, sr=sr, hop_length=hop_length)
plt.xlabel("Time")
plt.ylabel("MFCC coefficients")
plt.colorbar()
plt.title("MFCCs")

# show plots
plt.show()

728x90

저작자표시 (새창열림)

'👩‍💻 인공지능 (ML & DL) > AI' 카테고리의 다른 글

MFCC (Mel Frequency Cepstrum Coefficient) 음성 신호 분석하기 (0)	2022.11.03
[AI]_24_Q 학습 (Q-Learning) (0)	2022.05.19
[AI]_23_P-Value & Q-Value (0)	2022.05.08
[AI]_22_강화학습 & 도적 알고리즘 (0)	2022.05.07
[AI]_21_과학습 막기! (0)	2022.03.27

'👩‍💻 인공지능 (ML & DL)/AI' Related Articles

Comments

😎 공부하는 징징알파카는 처음이지?

FCC STFT MFCC 를 이용한 음성 신호 분석 본문

FCC STFT MFCC 를 이용한 음성 신호 분석

'👩‍💻 인공지능 (ML & DL) > AI' 카테고리의 다른 글

티스토리툴바