用根西姆耙

2024-04-24 20:44:38 发布

男 | 程序猿一只，喜欢编程写python代码。

我在计算相似度。首先，我使用RAKE库从爬网作业中提取关键字。然后我将每个作业的关键字放入单独的数组中，然后将所有这些数组组合成documentArray。在

documentArray = ['Anger command,Assertiveness,Approachability,Adaptability,Authenticity,Aggressiveness,Analytical thinking,Molecular Biology,Molecular Biology,Molecular Biology,molecular biology,molecular biology,Master,English,Molecular Biology,,Islamabad,Islamabad District,Islamabad Capital Territory,Pakistan,,Rawalpindi,Rawalpindi,Punjab,Pakistan'"], ['competitive compensation,assay design,positive attitude,regular basis,motivate others,meetings related,improve state,travel on,phd degree,meeting abstracts,benefits package,daily basis,scientific papers,application notes']

queryStr = 'In Vitro,Biochemistry,PCR,Western Blotting,Neuroscience,Molecular Biology,Cell biology,Immunohistochemistry,Microscopy,Animal Models,Presentations,Immunoprecipitation,Cell biology,Master's Degree,Bachelor's Degree,,,,,'

然后我写了下面的GENSIM代码

class Gensim:

def __init__(self):
    print("Init")

def calculateGensimSimilarity(self, texts, query):
    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]
    lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2)
    lda = models.LdaModel(corpus, id2word=dictionary, num_topics=2)
    index_lsi = similarities.MatrixSimilarity(lsi[corpus])
    index_lda = similarities.MatrixSimilarity(lda[corpus])
    vec_bow = dictionary.doc2bow(query.lower().split())
    vec_lsi = lsi[vec_bow]
    vec_lda = lda[vec_bow]
    print("LSI Model")
    sims_lsi = index_lsi[vec_lsi]
    print("LDA Model")
    print(sims_lsi)
    sims_lda = index_lda[vec_lda]
    print(sims_lda)

它正在打印LSA分数0和LDA分数90%+匹配。请让我知道我错在哪里，我如何修改，以计算正确的余弦相似度。在

LSA Score[ 0. 0.] LDA Score[ 0.94234258 0.9477495 ]

Tags： index dictionary corpus print lda texts vec molecular

0条回答

目前没有回答

用根西姆耙

相关问题更多 >

编程相关推荐

热门问题

热门文章

用根西姆耙

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >