Python spacymoji包_程序模块 - PyPI

spacy管道组件，用于向doc、token和span对象添加emoji元数据。

spacymoji的Python项目详细描述

spaCy v2.0扩展和管道组件用于将emoji元数据添加到Doc对象。检测由一个或更多Unicode字符，并可以选择合并多字符emoji（组合图片，带有肤色修饰的表情符号）。人类可读的表情符号描述作为自定义属性添加，并且可选的查找表可以提供给你自己的描述。扩展设置自定义Doc， Token和Span属性._.is_emoji，._.emoji_desc， ._.has_emoji和._.emoji。您可以阅读有关自定义管道的更多信息组件和扩展属性 here。

使用spacy的PhraseMatcher匹配emoji，并在数据中查找由“emoji” package提供的表。

安装

spacymoji需要spacyv2.0.0或更高版本。

pip install spacymoji

用法

导入组件并用共享的nlp对象初始化它（即 Language的实例，用于初始化PhraseMatcher 使用共享的vocab，并创建匹配模式。然后添加组件在你的管道的任何地方。

importspacyfromspacymojiimportEmojinlp=spacy.load('en')emoji=Emoji(nlp)nlp.add_pipe(emoji,first=True)doc=nlp(u"This is a test ? ??")assertdoc._.has_emoji==Trueassertdoc[2:5]._.has_emoji==Trueassertdoc[0]._.is_emoji==Falseassertdoc[4]._.is_emoji==Trueassertdoc[5]._.emoji_desc==u'thumbs up dark skin tone'assertlen(doc._.emoji)==2assertdoc._.emoji[1]==(u'??',5,u'thumbs up dark skin tone')

spacymoji只关心标记文本，因此您可以在空白处使用它 Languageinstance（应该对所有人都有效 available languages！），或带有加载模型的管道。如果你正在加载一个模型和你的管道包括标记器、解析器和实体识别器，确保添加emoji 组件为first=True，因此在标记化之后立即合并跨距，以及在分析文档之前。如果你的文本包含很多表情符号，这个甚至可以大大提高解析器的准确性。

可用属性

扩展设置Doc、Span和Token上的属性。你可以更改扩展名初始化时的属性名。更多细节在自定义组件和属性上，请参见 processing pipelines documentation。

^{tt21}$	bool	Whether the token is an emoji.
^{tt22}$	unicode	A human-readable description of the emoji.
^{tt23}$	bool	Whether the document contains emoji.
^{tt24}$	list	^{tt25}$ tuples of the document’s emoji.
^{tt26}$	bool	Whether the span contains emoji.
^{tt27}$	list	^{tt25}$ tuples of the span’s emoji.

设置

初始化Emoji时，可以定义以下设置：

^{tt12}$	^{tt13}$	The shared ^{tt12}$ object. Used to initialise the matcher with the shared ^{tt33}$, and create ^{tt1}$ match patterns.
^{tt35}$	tuple	Attributes to set on the ._ property. Defaults to ^{tt36}$.
^{tt37}$	unicode	ID of match pattern, defaults to ^{tt38}$. Can be changed to avoid ID conflicts.
^{tt39}$	bool	Merge spans containing multi-character emoji, defaults to ^{tt40}$. Will only merge combined emoji resulting in one icon, not sequences.
^{tt41}$	dict	Optional lookup table that maps emoji unicode strings to custom descriptions, e.g. translations or other annotations.

emoji=Emoji(nlp,attrs=('has_e','is_e','e_desc','e'),lookup={u'?‍?':u'David Bowie'})nlp.add_pipe(emoji)doc=nlp(u"We can be ?‍? heroes")assertdoc[3]._.is_eassertdoc[3]._.e_desc==u'David Bowie'

路线图

这个扩展仍然是实验性的，但是这里有一些特性可能以后添加时请保持冷静：

为emoji快捷方式添加匹配模式和属性，例如:+1:。可以选择将这些快捷方式合并到一个令牌中，并接收带有unicode emoji的NORM属性。NORM用作训练的功能，因此:+1:和将自动接收类似的表示。
添加对unicode emoji注释项目的支持。javascriptpackage还附带了pre-compiled JSON data，包括英语和德语的标准化和社区贡献的注释。

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

导航栏
项目描述
版本历史
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
inesmontani
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
是什么导致导入库时出现这种延迟？
是什么导致导入时提交大内存
是什么导致导入错误：“没有名为modules的模块”？
是什么导致局部变量引用错误？
是什么导致循环中的属性错误以及如何解决此问题
是什么导致我使用kivy的代码内存泄漏？
是什么导致我在python2.7中的代码中出现这种无意的无限循环？
是什么导致我的ATLAS工具在尝试构建时失败？
是什么导致我的Brainfuck transpiler的输出C文件中出现中止陷阱？
是什么导致我的Django文件上载代码内存峰值？
是什么导致我的json文件在添加kivy小部件后重置？
是什么导致我的python 404检查脚本崩溃/冻结？
是什么导致我的Python脚本中出现这种无效语法错误？
是什么导致我的while循环持续时间延长到12分钟？
是什么导致我的代码膨胀文本文件的大小？

spacymoji 2.0.0

spacymoji的Python项目详细描述

安装

用法

可用属性

设置

路线图

推荐PyPI第三方库

canper-ssh-client

razor-engine

pgx-variant-tools

qube

djangorelatedadmin

cld2cffi

libsrcvdmtl

reviewipsum

tfrecordlite

racingbars

tensorflow-exercise-hx

dsnd-distributions-sunil

yesongnester

proto-square-api

rdkit-to-params

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

spacymoji 2.0.0

spacymoji的Python项目详细描述

安装

用法

可用属性

设置

路线图

推荐PyPI第三方库

canper-ssh-client

razor-engine

pgx-variant-tools

qube

djangorelatedadmin

cld2cffi

libsrcvdmtl

reviewipsum

tfrecordlite

racingbars

tensorflow-exercise-hx

dsnd-distributions-sunil

yesongnester

proto-square-api

rdkit-to-params

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签