检查Python中单词之间的相似性或同义词

2024-06-11 12:17:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我想找到单词的同义词

如果单词是tall building,那么我想找到这个单词的所有同义词,比如"long apartment ,large building"等等

我用Spacy

import en_core_web_sm
nlp = en_core_web_sm.load()

LOOP
nlp('tall building').similarity(nlp(mytokens[i]))

我不能用这个,因为它需要很多时间

neither I can use PhraseMatcher for this

请帮帮我

提前谢谢


Tags: coreimportwebnlpspacyload单词long
2条回答

因此,从您的示例中很难判断,但看起来您在循环的每个迭代中都在创建一个新的spaCy文档,这会很慢。您应该这样做:

import spacy
nlp = spacy.load('en')

query = nlp('tall building')
for token in mytokens:
    query.similarity(nlp(token))

这样spaCy只需创建一次查询单据

如果要进行重复查询,应将每个文档的向量放入annoy或类似项中,以快速获得最相似的文档

另外,我一般不会把这个发现称为“同义词”,因为你给出的每个例子都有多个词。你真的在寻找类似的短语。“同义词”通常意味着单个单词,就像你在同义词词典中找到的那样,但这对你没有帮助

您可以尝试使用Beauty soup解析在线同义词库中的数据,或者使用python模块,例如[py同义词库]:https://pypi.org/project/py-thesaurus/

 from bs4 import BeautifulSoup as soup
 from urllib.request import urlopen as uReq
 from urllib.error import HTTPError




def find_synonym(string):
    """ Function to find synonyms for a string"""


    try:

        # Remove whitespace before and after word and use underscore between words
        stripped_string = string.strip()
        fixed_string = stripped_string.replace(" ", "_")
        print(f"{fixed_string}:")

        # Set the url using the amended string
        my_url = f'https://thesaurus.plus/thesaurus/{fixed_string}'
        # Open and read the HTMLz
        uClient = uReq(my_url)
        page_html = uClient.read()
        uClient.close()

        # Parse the html into text
        page_soup = soup(page_html, "html.parser")
        word_boxes = page_soup.find("ul", {"class": "list paper"})
        results = word_boxes.find_all("div", "list_item")

        # Iterate over results and print
        for result in results:
            print(result.text)

    except HTTPError:
        if "_" in fixed_string:
            print("Phrase not found! Please try a different phrase.")

        else:
            print("Word not found! Please try a different word.")


if __name__ == "__main__":
    find_synonym("hello ")

相关问题 更多 >