如何在Python中获取麦克风音频输入并实时处理？

58 投票

3 回答

126942 浏览

数据工程师

提问于 2025-04-15 17:13

你好，

我正在尝试用Python写一个程序，每当麦克风接收到声音时，它就会打印出一个字符串。这里的“声音”指的是突然的响亮噪音或者类似的东西。

我在StackOverflow上搜索了一下，找到了这个帖子：识别音频的音调

我觉得PyAudio这个库可能适合我的需求，但我不太确定怎么让我的程序等待音频信号（实时监控麦克风），而当我接收到信号后又该如何处理（我是否需要像上面帖子里提到的那样使用傅里叶变换）？

非常感谢你能提供的任何帮助。

音频处理实时监控 pyaudio 傅里叶变换麦克风输入音频编程音频信号噪音识别

3 个回答

我知道这个问题已经很老了，但如果有人再次查看这里... 可以看看这个链接 https://python-sounddevice.readthedocs.io/en/0.4.1/index.html。

里面有一个很不错的例子叫“输入到输出的直通”，你可以在这里找到 https://python-sounddevice.readthedocs.io/en/0.4.1/examples.html#input-to-output-pass-through。

... 还有很多其他的例子 ...

回答于 2025-04-15 由 Python大师

分享举报

...当我得到一个信号后，我该如何处理它（我需要像上面帖子中提到的那样使用傅里叶变换吗）？

如果你想要一个“敲击”信号，那么你可能更关心的是信号的幅度，而不是频率。所以傅里叶变换可能对你这个目标没有太大帮助。你可能想要实时测量输入信号的短期幅度（比如说10毫秒），并检测它何时突然增加到某个特定的值。你需要调整以下几个参数：

什么是“短期”幅度测量
你要寻找的幅度增加值是多少
这个幅度变化必须多快发生

虽然我说你不关心频率，但你可能想先做一些过滤，去掉特别低和特别高的频率成分。这样可以帮助你避免一些“误报”。你可以使用FIR或IIR数字滤波器来实现这个过滤；傅里叶变换并不是必需的。

回答于 2025-04-15 由 Python大师

分享举报

如果你在使用LINUX系统，可以用一个叫做 pyALSAAUDIO 的工具。对于Windows系统，我们可以使用 PyAudio，还有一个叫做 SoundAnalyse 的库。

我在这里找到一个适用于Linux的例子链接：

#!/usr/bin/python
## This is an example of a simple sound capture script.
##
## The script opens an ALSA pcm for sound capture. Set
## various attributes of the capture, and reads in a loop,
## Then prints the volume.
##
## To test it out, run it and shout at your microphone:

import alsaaudio, time, audioop

# Open the device in nonblocking capture mode. The last argument could
# just as well have been zero for blocking mode. Then we could have
# left out the sleep call in the bottom of the loop
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,alsaaudio.PCM_NONBLOCK)

# Set attributes: Mono, 8000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(8000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)

# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# For our purposes, it is suficcient to know that reads from the device
# will return this many frames. Each frame being 2 bytes long.
# This means that the reads below will return either 320 bytes of data
# or 0 bytes of data. The latter is possible because we are in nonblocking
# mode.
inp.setperiodsize(160)

while True:
    # Read data from device
    l,data = inp.read()
    if l:
        # Return the maximum of the absolute value of all samples in a fragment.
        print audioop.max(data, 2)
    time.sleep(.001)

回答于 2025-04-15 由 Python大师

分享举报

如何在Python中获取麦克风音频输入并实时处理？

3 个回答

撰写回答