Python DocumentFeatureSelection包_程序模块 - PyPI

从文本数据中选择特征的各种方法

DocumentFeatureSelection的Python项目详细描述

文档功能选择

这是一组来自文本数据的功能选择代码。
（关于功能选择，请参见[此处]（http://nlp.stanford.edu/ir-book/html/htmledition/feature-selection-1.html）或[此处]（http://stackoverflow.com/questions/13603882/feature selection and reduction for text classification））

对自然语言数据使用机器学习度量时，特征选择非常重要。
自然语言数据通常包含大量噪声信息，因此，如果不处理任何特征选择，机器学习度量将很弱。
（有一些例外像决策树或随机森林这样的算法。它们在算法内部有特征选择度量）

当您观察文本数据时，特征选择也很有用。
通过特征选择，您可以了解哪些特征真正有助于特定标签。

请访问[github上的项目页]（https://github.com/kensuke mitsuzawa/documentfeatureselection）。

GE支持以下特征选择方法

*tf-idf
*pointwise mutual information（pmi）
*关联强度（soa）
*bi-normal separation（bns）

自然选择方法
*由于稀疏矩阵和多处理的快速计算

odules/generated/sklearn.feature掼extraction.text.tfidftransformer.html）详细信息。

获取联合概率和边际概率。

要了解更多信息，请参阅[参考]（https://www.eecis.udel.edu/~trnka/cisc889-11s/learnings/philip pmi.pdf）

python世界中的[nltk]（http://www.nltk.org/howto/collocations.html）和[其他包]（https://github.com/bollegala/svdmi）还要提供pmi。
检查它们，并根据您的偏好和使用情况进行选择。

URE。
此外，你可以得到特征和类别之间的反相关性。

在这个软件包中，soa公式来自以下论文，

`Saif Mohammad和Svetlana Kirithenko，"使用标签捕获来自推特的精细情感类别"，计算智能，2014年1月31日；31（2）`

````
soa（w，e）\=\log 2{freq{freq（w，e）*freq（\neg{e}）{freq（e）*freq（w，neg{e}）}
```

` ` ``

*freq（w，e）是一个单位（句子或文档）中（句子或文档）出现在一个单位（句子或文档）中（句子或文档）的次数，其中

*freq（w，e）是一个单位（句子或文档）中出现在一个单位（句子或文档）中出现的次数（句子或文档）的freq（w，--e）是以单位出现的次数es not have the label（e）是有标签的单元数，freq（e）是没有标签的单元数，bns是二进制类数据的特征选择方法。_信息增益（ig），_卡方
（chi），_比值比

本文说明了bns在倾斜数据中的可行性。

``lei tang和huan liu，"高度倾斜数据文本分类中的偏差分析"，2005````

或

``george forman，"文本分类中特征选择度量的广泛实证研究"，《机器学习研究杂志》3（2003）1289-1305```

*python 3.x（勾选python3.5）设置

是的。请尝试手动安装numpy或尝试使用anaconda分发版。
``

在安装"scikit learn"之前，我们需要numpy和scipy。

在这种情况下，您可以选择以下选项

*手动安装"numpy"和"scipy"
*使用"anaconda"python发行版。请访问[他们的网站]（https://www.continumum.io/downloads）网站（https://www.continumum.io/downloads）。

``巨蟒
``巨蟒
"标签a"：[
[[[br/>[[[i"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"aa"，"bb"]，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，
，

，
，
，
，
，
，
，
，
，
，
；对于开发人员

使用docker compose启动开发环境。

此命令使用docker容器运行测试。

``bash
$cd tests/
$docker compose build
$docker compose up
```

欢迎加入QQ群-->： 979659372

DocumentFeatureSelection 1.5

DocumentFeatureSelection的Python项目详细描述

推荐PyPI第三方库

difflame

vortexai

structure-spider

zylxd

pypiclip

logmatic

xstaticfontawesome

ohmycron

oscplacement

shellac

pytestsugar

moola

timeliterals

pygenstrings

cleese-mpd

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

DocumentFeatureSelection 1.5

DocumentFeatureSelection的Python项目详细描述

推荐PyPI第三方库

difflame

vortexai

structure-spider

zylxd

pypiclip

logmatic

xstaticfontawesome

ohmycron

oscplacement

shellac

pytestsugar

moola

timeliterals

pygenstrings

cleese-mpd

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签