我对Python很陌生。我正在尝试编写一个函数来执行以下操作,并在以后的代码部分中重用该函数: (函数的作用):
然后我想根据上面函数返回的列表进行计算。但是,函数(即knearest_similarity(tfidf_datamatrix))不返回任何内容。第二个函数(即threshold_function())中的print命令不显示任何内容。有人能看看代码,告诉我我做错了什么吗。在
def knearest_similarity(tfidf_datamatrix):
k_nearest_cosineMean = []
for datavector in tfidf_datamatrix:
cosineValueSet = []
for trainingvector in tfidf_vectorizer_trainingset:
cosineValue = cx(datavector, trainingvector)
cosineValueSet.append(cosineValue)
similarityMean_of_k_nearest_neighbours = np.mean(heapq.nlargest(k_nearest_neighbours, cosineValueSet)) #the cosine similarity score of top k nearest neighbours
k_nearest_cosineMean.append(similarityMean_of_k_nearest_neighbours)
print k_nearest_cosineMean
return k_nearest_cosineMean
def threshold_function():
mean_cosineScore_mean = np.mean(knearest_similarity(tfidf_matrix_testset))
std_cosineScore_mean = np.std(knearest_similarity(tfidf_matrix_testset))
threshold = mean_cosineScore_mean - (3*std_cosineScore_mean)
print "The Mean of the mean of cosine similarity score for a normal Behaviour:", mean_cosineScore_mean #The mean will be used for finding the threshold
print "The standard deviation of the mean of cosine similarity score:", std_cosineScore_mean #The standstart deviation is also used to find threshold
print "The threshold for normal behaviour should be (Mean - 3*standard deviation):", threshold
return threshold
编辑
我尝试为要使用的函数定义两个全局变量(即tfidf_vectorizer_trainingset和tfidf_matrix_testset)。在
^{pr2}$但是threshold_function()中的打印命令显示如下:
The Mean of the mean of cosine similarity score for a normal Behaviour: nan
The standard deviation of the mean of cosine similarity score: nan
The threshold for normal behaviour should be (Mean - 3*standard deviation): nan
编辑2 我发现最接近余弦平均值的第一个值是nan。删除值后,我设法得到了有效的计算。在
我}是什么。你在第二行也这样做。在第三行中,您使用第二行的输出。给
threshold_function()
的第一行称为knearest_similarity(tfidf_matrix_testset)
,但是你从来没有定义{tfidf_matrix_testset
一个值。在相关问题 更多 >
编程相关推荐