ValueError:0不在python的列表中

2024-05-15 08:53:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图返回一个元组列表,该元组与问题的mosr相似候选者排序,并返回原始候选者列表中该候选者的索引: 我实现了这个功能:

from sklearn.metrics.pairwise import cosine_similarity

def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    cosi_dic={}
    most_candidates=[]
    q_vec=question_to_vec(question,embeddings,dim)
    for i in candidates:
      can_vec=question_to_vec(i,embeddings,dim)

      cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
    for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
    return most_candidates

函数question_to_vec是一个函数,用于获得句子中嵌入向量的所有单词的平均值,这里是函数:

def question_to_vec(question, embeddings, dim=300):
    """
        question: a string
        embeddings: dict where the key is a word and a value is its' embedding
        dim: size of the representation

        result: vector representation for the question
    """
    v=np.zeros(dim)
    all_vectors=[]
    question=question.split()
    for i in question:
      if i in embeddings:
        all_vectors.append(embeddings[i])
    if all_vectors:
      v=np.mean(all_vectors, axis=0)
    return v

预期输出应该是这样的:[(2,c)、(0,b)、(1,a)],如果c与输入列表候选中的索引2最相似,而a是最不相似的。但是,当我尝试运行此代码时:

wv_ranking = []
for i in range(len(validation)):
    line=validation[i]
    q, *ex = line
    ranks = rank_candidates(q, ex, wv_embeddings)
    wv_ranking.append([r[0] for r in ranks].index(0) + 1)

其中wv_embeddings是GoogleNews-vectors-negative300的EMBBEDING, 我得到了错误:ValueError: 0 is not in list 我试着检查得到异常的那条线之间的余弦,发现所有元素的值都是零


Tags: ofthetoinforalllistquestion
1条回答
网友
1楼 · 发布于 2024-05-15 08:53:06

深入研究错误后,发现在处理函数中的数据时使用字典会替换具有相同余弦相似值的值。因此,函数应如下所示:

def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    #cosi_dic={}
    most_candidates=[]
    updated_most_candidates=[]
    q_vec=question_to_vec(question,wv_embeddings,300)
    for i in candidates:
 # print(type(i))
      can_vec=question_to_vec(i,wv_embeddings,300)

      #cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
      sim=cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]
    #for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      #most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
      most_candidates.append((sim,i))
    most_candidates.sort(key=lambda x: x[0],reverse=True)
    for i in most_candidates:
      updated_most_candidates.append((candidates.index(i[1]),i[1]))


    return updated_most_candidates

相关问题 更多 >