一种用来组成说话人验证系统的软件包

speaker-verification-toolkit的Python项目详细描述


说话人验证工具包

本模块包含一些工具,用于进行简单的说话人验证。在

您可以使用PyPI下载它:

$pipinstallspeaker-verification-toolkit

要在自己的项目中导入和使用:

^{pr2}$

使用


find_nearest_voice_data(voice_data_list,voice_sample)

根据此语音样本查找最近的语音数据。可以用来做出天真的接受/拒绝决定。在

voice_data_list: a list containing all voices data from the dataset.

voice_sample: the voice sample reference.

returns: the index of the element from voice_data_list that represents the nearest voice data.


compute_distance(sample1,sample3)

使用O(n)DTW算法计算样本1和样本2之间的距离

sample1: the mfcc data extracted from the audio signal 1.

sample2: the mfcc data extracted from the audio signal 2.

returns: Float number representing the minimum distance between sample1 and sample2.


extract_mfcc(signal_data,samplerate=16000,winlen=0.025,winstep=0.01)

从音频信号计算MFCC特征

signal: the audio signal from which to compute features. Should be an N*1 array.

samplerate: the sample rate of the signal we are working with, in Hz.

winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).

winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).

returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.


extract_mfcc_from_wav_file(path,samplerate=16000,winlen=0.025,winstep=0.01)

从wav文件计算MFCC功能

path: the wav file path to be open.

samplerate: the wanted sample rate, in Hz. Default is 16000. If you want no resampling fill this argument with None.

winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).

winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).

returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.


rms_silence_filter(data,samplerate=16000,segment_length=None,threshold=0.001135)

切断信号音频数据的静音部分。无法处理受环境噪声影响的信号数据。 您可以考虑在使用此静音过滤器之前应用噪声过滤器,或者确保环境噪声足够小,可以被视为静音。在

data: the audio signal data

samplerate: if no segment_length is given, segment_length will be equals samplerate/100 (around 0.01 secs per segment).

segment_length: the number of frames per segment. I.e. for a sample rate SR, a segment length equals SR/100 will represent a chunk containing 0.01 seconds of audio.

threshold: the threshold value. Values less than or equal values will be cut off. The default value was defined at [1] (see the references).

returns: the param "data" without silence parts.

参考文献

[1]-Muhammad Asadullah&Shibli Nisar,“语音处理的静音消除和端点检测方法”,国立计算机与新兴科学大学,白沙瓦

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
使用Scala对Java进行单元测试?   java无法将应用程序部署到Tomcat   java如何在IntelliJ IDEA中创建补丁?   java如何在安卓中编程设置列表视图高度   java如何使用charAt检查字符串是否以AZ或AZ开头?   java在SQL查询中使用非限制值   java函数在不同的Android版本中返回不同的datetime值   java方法应该在实现动作的类中,还是在实现的类中?   java从另一个线程的类访问线程类的公共静态arrayList   java是否像重新引发相同的异常?   java如何从localhost访问本地文件   javaurl。openStream非常慢   java数组越界和空指针异常   java我只是在某种程度上破坏了Netbeans,我不知道如何修复它   java是否可以延迟类的加载,而这些类可能在以后动态加载?   java断开外壳输出到文件   从blob服务回调时出现java Google应用程序引擎错误   java将SparseArray存储在JSON中并使用它   使用IText for Java进行pdf文本定位   java如何更改SearchView的样式?