用根西姆耙

2024-04-24 20:44:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我在计算相似度。首先,我使用RAKE库从爬网作业中提取关键字。然后我将每个作业的关键字放入单独的数组中,然后将所有这些数组组合成documentArray。在

documentArray = ['Anger command,Assertiveness,Approachability,Adaptability,Authenticity,Aggressiveness,Analytical thinking,Molecular Biology,Molecular Biology,Molecular Biology,molecular biology,molecular biology,Master,English,Molecular Biology,,Islamabad,Islamabad District,Islamabad Capital Territory,Pakistan,,Rawalpindi,Rawalpindi,Punjab,Pakistan'"], ['competitive compensation,assay design,positive attitude,regular basis,motivate others,meetings related,improve state,travel on,phd degree,meeting abstracts,benefits package,daily basis,scientific papers,application notes']


queryStr = 'In Vitro,Biochemistry,PCR,Western Blotting,Neuroscience,Molecular Biology,Cell biology,Immunohistochemistry,Microscopy,Animal Models,Presentations,Immunoprecipitation,Cell biology,Master's Degree,Bachelor's Degree,,,,,'

然后我写了下面的GENSIM代码

class Gensim:

def __init__(self):
    print("Init")

def calculateGensimSimilarity(self, texts, query):
    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]
    lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2)
    lda = models.LdaModel(corpus, id2word=dictionary, num_topics=2)
    index_lsi = similarities.MatrixSimilarity(lsi[corpus])
    index_lda = similarities.MatrixSimilarity(lda[corpus])
    vec_bow = dictionary.doc2bow(query.lower().split())
    vec_lsi = lsi[vec_bow]
    vec_lda = lda[vec_bow]
    print("LSI Model")
    sims_lsi = index_lsi[vec_lsi]
    print("LDA Model")
    print(sims_lsi)
    sims_lda = index_lda[vec_lda]
    print(sims_lda)

它正在打印LSA分数0和LDA分数90%+匹配。请让我知道我错在哪里,我如何修改,以计算正确的余弦相似度。在

LSA Score[ 0. 0.] LDA Score[ 0.94234258 0.9477495 ]


Tags: indexdictionarycorpusprintldatextsvecmolecular