将音频文件导入Python为NumPy数组（audiolab的替代方案）

14 投票

5 回答

40148 浏览

提问于 2025-04-15 19:52

我之前一直在用Audiolab来导入声音文件，效果还不错。不过：

它不支持一些格式，比如mp3，因为它依赖的库libsndfile不愿意支持这些格式
在Windows上，它在Python 2.6下无法使用，而且作者也不在了，没人来修复这个问题

In [2]: from scikits import audiolab
--------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

C:\Python26\Scripts\<ipython console> in <module>()

C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>()
     23 __version__ = _version
     24
---> 25 from pysndfile import formatinfo, sndfile
     26 from pysndfile import supported_format, supported_endianness, \
     27                       supported_encoding, PyaudioException, \

C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>()
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings
      2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError
      3 from compat import supported_format, supported_endianness, supported_encoding

ImportError: DLL load failed: The specified module could not be found.``

所以我想要：

弄清楚为什么在2.6下不工作（是不是_sndfile.pyd有问题？），也许还能找到办法让它支持那些不被支持的格式
找到一个完全可以替代audiolab的工具

音频处理音频库数组操作文件格式支持替代工具 Windows兼容性 libsndfile 音频导入

5 个回答

Sox http://sox.sourceforge.net/ 可以帮助你处理这个问题。它可以读取很多不同格式的音频文件，并且可以把它们输出为你想要的原始数据类型。实际上，我刚刚写了代码，可以把音频文件中的一段数据读入到一个numpy数组里。

我选择这个方法是为了便于移植（因为sox非常普遍可用），同时也为了最大化我可以使用的输入音频类型的灵活性。实际上，从初步测试来看，它在我使用的场景下并没有明显变慢……我用它来从很长（几个小时）的音频文件中读取短（几秒钟）的音频。

你需要的变量有：

SOX_EXEC # the sox / sox.exe executable filename
filename # the audio filename of course
num_channels # duh... the number of channels
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8

start_samp # sample number to start reading at
len_samp   # number of samples to read

实际的代码非常简单。如果你想提取整个文件，可以去掉start_samp、len_samp和'trim'相关的内容。

import subprocess # need the subprocess module
import numpy as NP # I'm lazy and call numpy NP

cmd = [SOX_EXEC,
       filename,              # input filename
       '-t','raw',            # output file type raw
       '-e','signed-integer', # output encode as signed ints
       '-L',                  # output little endin
       '-b',str(out_byps*8),  # output bytes per sample
       '-',                   # output to stdout
       'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part 

data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps))
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels

另外，这里有一段代码可以用sox从音频文件的头部读取信息……

    info = subprocess.check_output([SOX_EXEC,'--i',filename])
    reading_comments_flag = False
    for l in info.splitlines():
        if( not l.strip() ):
            continue
        if( reading_comments_flag and l.strip() ):
            if( comments ):
                comments += '\n'
            comments += l
        else:
            if( l.startswith('Input File') ):
                input_file = l.split(':',1)[1].strip()[1:-1]
            elif( l.startswith('Channels') ):
                num_channels = int(l.split(':',1)[1].strip())
            elif( l.startswith('Sample Rate') ):
                sample_rate = int(l.split(':',1)[1].strip())
            elif( l.startswith('Precision') ):
                bits_per_sample = int(l.split(':',1)[1].strip()[0:-4])
            elif( l.startswith('Duration') ):
                tmp = l.split(':',1)[1].strip()
                tmp = tmp.split('=',1)
                duration_time = tmp[0]
                duration_samples = int(tmp[1].split(None,1)[0])
            elif( l.startswith('Sample Encoding') ):
                encoding = l.split(':',1)[1].strip()
            elif( l.startswith('Comments') ):
                comments = ''
                reading_comments_flag = True
            else:
                if( other ):
                    other += '\n'+l
                else:
                    other = l
                if( output_unhandled ):
                    print >>sys.stderr, "Unhandled:",l
                pass

回答于 2025-04-15 由 Python大师

分享举报

Audiolab在我的Ubuntu 9.04系统上运行得很好，使用的是Python 2.6.2，所以可能是Windows系统的问题。在你提供的论坛链接中，作者也提到这是一个Windows的错误。

以前，这个选项对我也有效：

from scipy.io import wavfile
fs, data = wavfile.read(filename)

不过要注意，data可能是int类型，这样它的值就不会在[-1,1)这个范围内。例如，如果data是int16类型，你需要把data除以2**15，这样才能把它缩放到[-1,1)这个范围内。

回答于 2025-04-15 由 Python大师

分享举报

最近我开始用 PySoundFile 这个库，代替之前用的Audiolab。用 conda 安装起来非常简单。

不过，它不支持mp3格式，这和很多其他工具一样。虽然mp3现在已经不再受专利保护，所以理论上是可以支持的；只是还需要有人去把支持功能写进libsndfile 这个库里。

回答于 2025-04-15 由 Python大师

分享举报

将音频文件导入Python为NumPy数组（audiolab的替代方案）

5 个回答

撰写回答