Python keybert包_程序模块 - PyPI

KeyBERT使用stateofart变压器模型执行关键字提取。

keybert的Python项目详细描述

凯伯特

KeyBERT是一种最小且易于使用的关键字提取技术，它利用BERT嵌入来创建与文档最相似的关键字和关键字短语。在

可找到相应的媒体帖子here。在

About the Project
Getting Started
2.1条。Installation
2.2条。Basic Usage
2.3条。Max Sum Similarity
2.4条。Maximal Marginal Relevance

1. About the Project

Back to ToC

虽然已经有很多方法可以用来生成关键字（例如。， Rake， YAKE!，TF-IDF等）我想创建一个非常基本但功能强大的方法来提取关键字和关键字短语。这就是KeyBERT出现的地方！它使用了BERT嵌入和简单余弦相似性在文档中查找与文档本身最相似的子短语。在

首先，用BERT提取文档嵌入，得到文档级表示。然后，提取N元单词/短语的单词嵌入。最后，我们使用余弦相似性查找与文档最相似的单词/短语。最相似的词可以然后用最能描述整个文档的词来标识。在

KeyBERT并不是唯一的，它是一种快速而简单的方法用于创建关键字和关键短语。虽然有很多伟大的使用BERT嵌入的论文和解决方案（例如。， 1， 2， 3， )，我找不到一个基于BERT的解决方案，它不需要从头开始培训可用于初学者（如果我错了请纠正我！）。因此，目标是一个pip install keybert，最多使用3行代码。在

2. Getting Started

Back to ToC

建议使用

2.1. Installation

PyTorch 1.2.0或更高版本。如果下面的安装给出了错误，请先安装Pythorchhere。在

可以使用pypi完成安装：

pip install keybert

2.2. Usage

The most minimal example can be seen below for the extraction of keywords:

^{pr 2}$

You can set ^{} to set the length of the resulting keywords/keyphrases:

^{pr 3}$

To extract keyphrases, simply set ^{} to 2 or higher depending on the number of words you would like in the resulting keyphrases:

^{pr 4}$

NOTE: For a full overview of all possible transformer models see sentence-transformer。我建议'distilbert-base-nli-mean-tokens'或{}，因为他们分别在语义相似度和释义识别方面表现出了良好的表现。在

2.3. Max Sum Similarity

To diversity the results, we take the 2 x top_n most similar words/phrases to the document. Then, we take all top_n combinations from the 2 x top_n words and extract the combination that are the least similar to each other by cosine similarity.

^{pr 5}$

2.4. Maximal Marginal Relevance

To diversify the results, we can use Maximal Margin Relevance (MMR) to create keywords / keyphrases which is also based on cosine similarity. The results with high diversity:

^{pr 6}$

The results with low diversity:

^{pr 7}$

References

Below, you can find several resources that were used for the creation of KeyBERT but most importantly, these are amazing resources for creating impressive keyword extraction models:

Papers:

Sharma, P., & Li, Y. (2019). Self-Supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling.

Github回购：

MMR：
关键字/关键短语的选择是在以下基础上建立的：

https://github.com/swisscom/ai-research-keyphrase-extraction

NOTE：如果您找到一个具有易于使用的实现的纸质或github回购关键字/关键短语提取的BERT嵌入，让我知道！我一定会的将其添加到本回购协议中。在

欢迎加入QQ群-->： 979659372

keybert 0.1.3

keybert的Python项目详细描述

凯伯特

Table of Contents

1. About the Project

2. Getting Started

2.1. Installation

2.2. Usage

2.3. Max Sum Similarity

2.4. Maximal Marginal Relevance

References

推荐PyPI第三方库

ebmdatalab

appomatic_cms_tagging

githubapi

udownmark

libindic-payyans

ohmycron

dash-color-picker

pocke

bce-sdk

helga-trade

firewatch

jsondataunit

fontMath

structure-spider

typedtensor

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

keybert 0.1.3

keybert的Python项目详细描述

凯伯特

Table of Contents

1. About the Project

2. Getting Started

2.1. Installation

2.2. Usage

2.3. Max Sum Similarity

2.4. Maximal Marginal Relevance

References

推荐PyPI第三方库

ebmdatalab

appomatic_cms_tagging

githubapi

udownmark

libindic-payyans

ohmycron

dash-color-picker

pocke

bce-sdk

helga-trade

firewatch

jsondataunit

fontMath

structure-spider

typedtensor

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签