Python:谱图语音识别

2024-04-24 09:02:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图重现youtube视频https://www.youtube.com/watch?v=g-sndkf7mCs中看到的预处理过程,他们创建了一个20毫秒窗口的频谱图,然后对其应用FFT。最后,他们将获得的光谱图输入神经网络。我使用的是scipy包,但是我对要使用的参数有点困惑。代码如下:

def get_spectrogram(path, nsamples=16000):
    '''
    Given path, return specgram.
    '''
    # read the wav files
    wav = wavfile.read(path)[1] # 16000 samples per second

    # zero pad the shorter samples and cut off the long ones to have a signal of 1 sec.
    if wav.size < nsamples:
        d = np.pad(wav, (nsamples - wav.size, 0), mode='constant')
    else:
        d = wav[0:nsamples]

    # get the specgram
    specgram = signal.spectrogram(d, fs= ? , nperseg=None, noverlap=None, nfft=None)[2]

    return specgram

此外,我还想知道输出的形状是什么?是(X,1)?在


Tags: thepathnonereadsizegetsignalreturn