音频段对象与wave文件/d之间的转换

2024-05-15 16:11:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从mp3语音文件提取MFCC功能,但我确实想保持源文件不变,不添加任何新文件。我的处理包括以下步骤:

  • 加载.mp3文件,消除静默,并使用pydub生成.wav数据
  • 使用scipy.io.wavfile.read()读取音频数据和速率
  • 使用python_speech_features提取特征

但是,eliminate_silence()返回一个AudioSegment对象,而scipy.io.wavfile.read()接受一个.wav文件名,因此我被迫临时将数据保存/导出为wave,以确保两者之间的转换。这个步骤需要内存和时间,所以我的问题是:如何避免导出wave文件步骤?或者有解决办法吗?在

这是我的密码。在

import os
from pydub import AudioSegment
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc
from pydub.silence import split_on_silence

def eliminate_silence(input_path):
    """ Eliminate silent chunks from original call recording """
    # Import input wave file
    sound  = AudioSegment.from_mp3(input_path)
    chunks = split_on_silence(sound,
                              # split on silences longer than 1000ms (1 sec)
                              min_silence_len=500,
                              # anything under -16 dBFS is considered silence
                              silence_thresh=-30,
                              # keep 200 ms of leading/trailing silence
                              keep_silence=100)

    output_chunks = AudioSegment.empty()
    for chunk in chunks: output_chunks += chunk
    return output_chunks


silence_clear_data = eliminate_silence("file.mp3")
silence_clear_data.export("temp.wav", format="wav")
rate, audio = read("temp.wav")
os.remove("temp.wav")

# Extract MFCCs
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15,
                    nfilt = 35, nfft = 512, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)

Tags: 文件数据fromioimportread步骤scipy
2条回答

看起来AudioSegment.get_array_of_samples()就是你需要的。(在传递给mfcc之前,可能需要从该数组构造一个numpy数组。)

我目前正在做一个项目,我使用静音和mfcc系数进行音频切割,我留下了我的解决方案:

import pydub
import python_speech_features as p
import numpy as np

def generate_mfcc_without_silences(path):
    #get audio and change frame rate to 16KHz
    audio_file = pydub.AudioSegment.from_wav(path)
    audio_file = audio_file.set_frame_rate(16000)
    #cut audio using silences
    chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=audio_file.dBFS, min_silence_len=200)
    mfccs = []
    for chunk in chunks:
        #compute mfcc from chunk array
        np_chunk = np.frombuffer(chunk.get_array_of_samples(), dtype=np.int16)
        mfccs.append(p.mfcc(np_chunk, samplerate=audio_file.frame_rate, numcep=26))
    return mfccs

注意事项:

·我将音频更改为16KHz,但这是可选的

·我将min_silence_len的值设为200,因为我想尝试获取单个单词

根据我的功能和您的要求,您需要的功能可能是:

^{pr2}$

相关问题 更多 >