Python profanity-check包_程序模块 - PyPI

一个快速、健壮的库，用于检查字符串中的攻击性语言。

profanity-check的Python项目详细描述

亵渎检查

一个快速、健壮的python库，用于检查字符串中的亵渎或攻击性语言。阅读更多关于profanity-check如何以及为什么在this blog post中构建的信息。您还可以测试profanity-checkin your browser。

工作原理

profanity-check使用了一个线性支持向量机模型，该模型对200k个人类标记的干净和亵渎文本字符串样本进行训练。它的模型很简单，但却出人意料地有效，这意味着profanity-check既健壮又性能卓越。

为什么要用亵渎支票？

没有明确的黑名单

许多亵渎检测库使用硬编码的坏单词列表来检测和过滤亵渎。例如，profanity使用this wordlist，甚至better-profanity仍然使用a wordlist。显然，这种方法存在明显的问题，尽管它们可能是性能的，这些库根本不准确。

一个简单的例子，其中profanity-check更好的是短语“you cocksucker”-profanity认为这是干净的，因为它的单词表中没有“cocksucker”。

性能

像profanity-filter这样的其他库使用更复杂的方法，这些方法更精确，但会牺牲性能。使用a Kaggle dataset of Wikipedia comments的基准测试（于2018年12月在新的2018 MacBook Pro上执行）大致得出以下结果：

Package	1 Prediction (ms)	10 Predictions (ms)	100 Predictions (ms)
profanity-check	0.2	0.5	3.5
profanity-filter	60	1200	13000
profanity	0.3	1.2	24

profanity-check比这个基准中的profanity-filter快300-4000倍！

准确度

这张桌子说明了一切：

Package	Test Accuracy	Balanced Test Accuracy	Precision	Recall	F1 Score
profanity-check	95.0%	93.0%	86.1%	89.6%	0.88
profanity-filter	91.8%	83.6%	85.4%	70.2%	0.77
profanity	85.6%	65.1%	91.7%	30.8%	0.46

有关用于这些结果的数据集的更多详细信息，请参阅下面的“如何”部分。

安装

$ pip install profanity-check

用法

fromprofanity_checkimportpredict,predict_probpredict(['predict() takes an array and returns a 1 for each string if it is offensive, else 0.'])# [0]predict(['fuck you'])# [1]predict_prob(['predict_prob() takes an array and returns the probability each string is offensive'])# [0.08686173]predict_prob(['go to hell, you scum'])# [0.7618861]

注意，predict()和predict_prob都返回^{}数组。

有关其工作原理的详细信息

如何

特别感谢在这个项目中使用的数据集的作者。profanity-check是在两个来源的组合数据集上训练的：

t-davidson/hate-speech-and-offensive-language，用于他们的论文自动仇恨语音检测和攻击性语言问题
kaggle上的Toxic Comment Classification Challenge。

profanity-check在很大程度上依赖于优秀的^{}库。它主要由scikit-learn类^{}、^{}和^{}驱动。它使用Bag-of-words model对输入字符串进行矢量化，然后将其输入到线性分类器。

为什么

思考profanity-check工作原理的一种简化方法是：在训练过程中，模型学习哪些单词是“坏”的，以及它们是如何“坏”的，因为这些单词将更经常出现在攻击性文本中。因此，这就好像训练过程是从所有可能的单词中挑出“不好”的单词，并用它们来做未来的预测。这比仅仅依靠人类选择的任意单词黑名单要好！

注意事项

这个图书馆远不完美。例如，很难找到不太常见的咒语变体，比如“f4ck you”或“you b1tch”，因为它们在训练语料库中出现的频率不够。永远不要将此库中的任何预测视为毫无疑问的事实，因为它确实会出错。相反，请将此库用作启发式。

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
vzhou842
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
jupyter运行一个旧的pytorch版本
Jupyter运行不同版本的卸载库？
Jupyter运行指定的键盘快捷键
Jupyter通过.local文件“逃逸”virtualenv。我该如何缓解这种情况？
Jupyter重新加载自定义样式
Jupyter错误：“没有名为Jupyter_core.paths的模块”
jupyter错误：无法在随机林中将决策树视为png
Jupyter错误'内核似乎已经死亡，它将自动重新启动'为一个给定的代码块
Jupyter错误地用阿拉伯语和字母数字元素显示Python列表
Jupyter隐藏数据帧索引，但保留原始样式
Jupyter集线器：启动器中出现致命错误。。。系统找不到指定的文件
Jupyther中相同值的相同哈希，但导出到Bigquery时不相同
Jupy上Python的读/写访问问题
jupy上没有模块cv
Jupy上的排序错误

profanity-check 1.0.3

profanity-check的Python项目详细描述

亵渎检查

工作原理

为什么要用亵渎支票？

没有明确的黑名单

性能

准确度

安装

用法

有关其工作原理的详细信息

如何

为什么

注意事项

推荐PyPI第三方库

AmpliPython

swirlyp

corenlp-python

python-seabird

BIF

glean-sdk

syborg

gitmodistributions

hs-social-listener

lscsoftglue

dwavecloudclient

browsercookie3

ShortestPathIntermediateStages

cf.pyutils

conceptual

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

profanity-check 1.0.3

profanity-check的Python项目详细描述

亵渎检查

工作原理

为什么要用亵渎支票？

没有明确的黑名单

性能

准确度

安装

用法

有关其工作原理的详细信息

如何

为什么

注意事项

推荐PyPI第三方库

AmpliPython

swirlyp

corenlp-python

python-seabird

BIF

glean-sdk

syborg

gitmodistributions

hs-social-listener

lscsoftglue

dwavecloudclient

browsercookie3

ShortestPathIntermediateStages

cf.pyutils

conceptual

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签