如何从python中的麦克风获取声音输入,并在运行中对其进行处理?

2024-06-06 16:51:28 发布

您现在位置:Python中文网/ 问答频道 /正文

你好

我试着用Python编写一个程序,每次听到麦克风的声音,它都会打印一个字符串。当我说“tap”时,我指的是突然的响声或类似的声音。

我在SO中搜索到了这个帖子:Recognising tone of the audio

我认为PyAudio库可以满足我的需要,但我不太确定如何使我的程序等待音频信号(实时麦克风监控),当我有一个如何处理它(我需要使用傅里叶变换,就像上面的文章中所指示的那样)?

提前谢谢你能给我的帮助。


Tags: ofthe字符串程序声音so信号tap
2条回答

...and when I got one how to process it (do I need to use Fourier Transform like it was instructed in the above post)?

如果你想要一个“抽头”,那么我认为你对振幅比频率更感兴趣。所以傅里叶变换可能对你的特定目标没有用处。您可能需要对输入的短期(比如10毫秒)振幅进行连续测量,并检测其何时突然增加某个增量。您需要调整以下参数:

  • 什么是“短期”振幅测量
  • 你要找的增量是多少
  • 三角洲变化发生的速度

虽然我说你对频率不感兴趣,但你可能想先做一些过滤,过滤掉特别是低频和高频成分。这可能有助于避免一些“误报”。你可以用FIR或IIR数字滤波器来实现,不需要傅里叶。

如果您使用的是LINUX,那么可以使用pyALSAAUDIO。 对于windows,我们有PyAudio,还有一个名为SoundAnalyse的库。

我发现了一个Linux的例子here

#!/usr/bin/python
## This is an example of a simple sound capture script.
##
## The script opens an ALSA pcm for sound capture. Set
## various attributes of the capture, and reads in a loop,
## Then prints the volume.
##
## To test it out, run it and shout at your microphone:

import alsaaudio, time, audioop

# Open the device in nonblocking capture mode. The last argument could
# just as well have been zero for blocking mode. Then we could have
# left out the sleep call in the bottom of the loop
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,alsaaudio.PCM_NONBLOCK)

# Set attributes: Mono, 8000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(8000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)

# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# For our purposes, it is suficcient to know that reads from the device
# will return this many frames. Each frame being 2 bytes long.
# This means that the reads below will return either 320 bytes of data
# or 0 bytes of data. The latter is possible because we are in nonblocking
# mode.
inp.setperiodsize(160)

while True:
    # Read data from device
    l,data = inp.read()
    if l:
        # Return the maximum of the absolute value of all samples in a fragment.
        print audioop.max(data, 2)
    time.sleep(.001)

相关问题 更多 >