分叉版本的[Pocketspinx Python](https://github.com/bambocher/Pocketspinx-python)增加了安装模型的实用程序和streamspeech接口。

pocketsphinx-fork的Python项目详细描述


PocketshinxPython

pocketspinx是CMU Sphinx语音识别开源工具包的一部分。

这个包提供了一个python接口,可以连接到用SWIGSetuptools创建的cmuSphinxbasePocketsphinx库。

支持的平台

  • 窗口(未测试)
  • Linux
  • Mac OS X(未测试)

安装要求

Windows要求:

ubuntu要求:

sudo apt-get install -qq python python-dev python-pip build-essential swig git libpulse-dev libasound2-dev

Mac OS X要求:

brew reinstall swig python

安装

# Make sure we have up-to-date versions of pip, setuptools and wheel
python -m pip install --upgrade pip setuptools wheel
pip install --upgrade pocketsphinx

可以使用更多用于手动安装的二进制发行版here

安装型号

也可以使用此软件包安装.tar.gz格式的PocketsFinx模型。

frompocketsphinximportPocketsphinxModel,AudioFilemodels=PocketsphinxModel(model_path='/some/installation/path')# this will install the model from the give url under name 'de'models.install_model('https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/cmusphinx-de-voxforge-5.2.tar.gz','de')de=models.get_model('de')# this returns a dictionary with the locations of hmm, lm and dict of the model# we can now use the 'de' model directly with any pocketsphinx objectforphraseinLiveSpeech(model=de):print(phrase)

默认的model_path'~/pocketsphinx_models'

用法

现场演讲

它是一个迭代器类,用于从麦克风进行连续识别或关键字搜索。

frompocketsphinximportLiveSpeechforphraseinLiveSpeech():print(phrase)

关键字搜索示例:

frompocketsphinximportLiveSpeechspeech=LiveSpeech(lm=False,keyphrase='forward',kws_threshold=1e-20)forphraseinspeech:print(phrase.segments(detailed=True))

使用您的型号和字典:

importosfrompocketsphinximportLiveSpeech,get_model_pathmodel_path=get_model_path()speech=LiveSpeech(verbose=False,sampling_rate=16000,buffer_size=2048,no_search=False,full_utt=False,hmm=os.path.join(model_path,'en-us'),lm=os.path.join(model_path,'en-us.lm.bin'),dic=os.path.join(model_path,'cmudict-en-us.dict'))forphraseinspeech:print(phrase)

流语音

这可用于将原始字节块发送到迭代器,通常在通过套接字或类似方式传输音频时使用。

frompocketsphinximportStreamSpeechf=open('somefile.wav','rb')defcallback():returnf.read(2048)forphraseinStreamSpeech(callback=callback):print(phrase)

有关关键字搜索和自定义模型的示例,请参见livespeech

音频文件

它是一个迭代器类,用于连续识别或从文件中搜索关键字。

frompocketsphinximportAudioFileforphraseinAudioFile():print(phrase)# => "go forward ten meters"

关键字搜索示例:

frompocketsphinximportAudioFileaudio=AudioFile(lm=False,keyphrase='forward',kws_threshold=1e-20)forphraseinaudio:print(phrase.segments(detailed=True))# => "[('forward', -617, 63, 121)]"

使用您的型号和字典:

importosfrompocketsphinximportAudioFile,get_model_path,get_data_pathmodel_path=get_model_path()data_path=get_data_path()config={'verbose':False,'audio_file':os.path.join(data_path,'goforward.raw'),'buffer_size':2048,'no_search':False,'full_utt':False,'hmm':os.path.join(model_path,'en-us'),'lm':os.path.join(model_path,'en-us.lm.bin'),'dict':os.path.join(model_path,'cmudict-en-us.dict')}audio=AudioFile(**config)forphraseinaudio:print(phrase)

将帧转换为时间坐标:

frompocketsphinximportAudioFile# Frames per Secondfps=100forphraseinAudioFile(frate=fps):# frate (default=100)print('-'*28)print('| %5s |  %3s  |   %4s   |'%('start','end','word'))print('-'*28)forsinphrase.seg():print('| %4ss | %4ss | %8s |'%(s.start_frame/fps,s.end_frame/fps,s.word))print('-'*28)# ----------------------------# | start |  end  |   word   |# ----------------------------# |  0.0s | 0.24s | <s>      |# | 0.25s | 0.45s | <sil>    |# | 0.46s | 0.63s | go       |# | 0.64s | 1.16s | forward  |# | 1.17s | 1.52s | ten      |# | 1.53s | 2.11s | meters   |# | 2.12s |  2.6s | </s>     |# ----------------------------

口袋狮身人面像

这是一个简单灵活的代理类,它是{{CD4}}。

frompocketsphinximportPocketsphinxprint(Pocketsphinx().decode())# => "go forward ten meters"

更全面的示例:

from__future__importprint_functionimportosfrompocketsphinximportPocketsphinx,get_model_path,get_data_pathmodel_path=get_model_path()data_path=get_data_path()config={'hmm':os.path.join(model_path,'en-us'),'lm':os.path.join(model_path,'en-us.lm.bin'),'dict':os.path.join(model_path,'cmudict-en-us.dict')}ps=Pocketsphinx(**config)ps.decode(audio_file=os.path.join(data_path,'goforward.raw'),buffer_size=2048,no_search=False,full_utt=False)print(ps.segments())# => ['<s>', '<sil>', 'go', 'forward', 'ten', 'meters', '</s>']print('Detailed segments:',*ps.segments(detailed=True),sep='\n')# => [#     word, prob, start_frame, end_frame#     ('<s>', 0, 0, 24)#     ('<sil>', -3778, 25, 45)#     ('go', -27, 46, 63)#     ('forward', -38, 64, 116)#     ('ten', -14105, 117, 152)#     ('meters', -2152, 153, 211)#     ('</s>', 0, 212, 260)# ]print(ps.hypothesis())# => go forward ten metersprint(ps.probability())# => -32079print(ps.score())# => -7066print(ps.confidence())# => 0.04042641466841839print(*ps.best(count=10),sep='\n')# => [#     ('go forward ten meters', -28034)#     ('go for word ten meters', -28570)#     ('go forward and majors', -28670)#     ('go forward and meters', -28681)#     ('go forward and readers', -28685)#     ('go forward ten readers', -28688)#     ('go forward ten leaders', -28695)#     ('go forward can meters', -28695)#     ('go forward and leaders', -28706)#     ('go for work ten meters', -28722)# ]

默认配置

如果在创建pocketspinx、audiofile或livespeech类的实例时未传递任何参数,则它将使用下一个默认值:

verbose=Falselogfn=/dev/nullornulaudio_file=site-packages/pocketsphinx/data/goforward.rawaudio_device=Nonesampling_rate=16000buffer_size=2048no_search=Falsefull_utt=Falsehmm=site-packages/pocketsphinx/model/en-uslm=site-packages/pocketsphinx/model/en-us.lm.bindict=site-packages/pocketsphinx/model/cmudict-en-us.dict

任何其他选项都必须按原样传递到配置中,而不使用符号-

如果要禁用默认语言模型或字典,可以将相应选项的值更改为false:

lm=Falsedict=False

详细

将输出发送到标准输出:

frompocketsphinximportPocketsphinxps=Pocketsphinx(verbose=True)ps.decode()print(ps.hypothesis())

将输出发送到文件:

frompocketsphinximportPocketsphinxps=Pocketsphinx(verbose=True,logfn='pocketsphinx.log')ps.decode()print(ps.hypothesis())

兼容性

父类仍然可用:

importosfrompocketsphinximportDefaultConfig,Decoder,get_model_path,get_data_pathmodel_path=get_model_path()data_path=get_data_path()# Create a decoder with a certain modelconfig=DefaultConfig()config.set_string('-hmm',os.path.join(model_path,'en-us'))config.set_string('-lm',os.path.join(model_path,'en-us.lm.bin'))config.set_string('-dict',os.path.join(model_path,'cmudict-en-us.dict'))decoder=Decoder(config)# Decode streaming databuf=bytearray(1024)withopen(os.path.join(data_path,'goforward.raw'),'rb')asf:decoder.start_utt()whilef.readinto(buf):decoder.process_raw(buf,False,False)decoder.end_utt()print('Best hypothesis segments:',[seg.wordforsegindecoder.seg()])

使用pocketspinx python的项目

  • SpeechRecognition-用于执行语音识别的库,支持多个引擎和api,在线和离线。

许可证

The BSD License

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java SimpleCursorAdapter删除值   java结束while循环条件   java检查设备是否连接到特定网络   java组织。冬眠MappingException找不到逻辑名称为annotation getter的列   为什么java邮件中会抛出此异常?   加载SDK时发生java Eclipse错误   返回奇怪输出的Java数组   JavaXStream和对象类序列化   将枚举列表传递给namedQuery后出现java非法转换异常。Hibernate中的setParameter()   java Android studio不允许我在字符串上使用开关?   有没有办法从Java程序访问存储在Chrome中的cookie   java在枚举中构造实例而不修改枚举类   java Blackberry JDE FieldChangeListener   java修复错误:未报告的异常InterruptedException   java Spring数据JPA:findAll(具有规范和可分页)在计数查询中失败