Python ibex包_程序模块 - PyPI

用于scikit学习的pandas适配器

ibex的Python项目详细描述

阿米·塔沃里、沙哈尔·阿祖莱、塔利·拉维·萨达卡

https://travis-ci.org/atavory/ibex.svg?branch=master

https://landscape.io/github/atavory/ibex/master/landscape.svg?style=flat

https://img.shields.io/codecov/c/github/atavory/ibex/master.svg

http://readthedocs.org/projects/ibex/badge/?version=latest

https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg

这个库有两个目标（有些独立）：

为estimators conforming to the scikit-learn protocol提供pandas适配器，特别是scikit-learn本身的适配器
提供更简单、更简洁的方法来组合估计器、功能和管道

（您可能还想查看优秀的pandas-sklearn，它有相同的目标，但需要一个非常不同的接近。）

位于的完整文档详细定义了这些问题，但是库中有一个非常小的interface。

tl；dr

下面的简短示例显示了库的要点。它是scikit学习示例Concatenating multiple feature extraction methods的改编。在本例中，我们使用PCA、univariate feature selection和support vecor machine classifier的组合为iris dataset构建分类器。

我们首先将iris数据集加载到pandasDataFrame中。

>>> import numpy as np
>>> from sklearn import datasets
>>> import pandas as pd
>>>
>>> iris = datasets.load_iris()
>>> features, targets, iris = iris['feature_names'], iris['target_names'], pd.DataFrame(
...     np.c_[iris['data'], iris['target']],
...     columns=iris['feature_names']+['class'])
>>> iris['class'] = iris['class'].map(pd.Series(targets))
>>>
>>> iris.head()
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2
<BLANKLINE>
    class
0  setosa
1  setosa
2  setosa
3  setosa
4  setosa

现在，我们导入相关步骤。注意，在本例中，我们从ibex.sklearn导入它们，而不是从sklearn导入它们。

>>> from ibex.sklearn.svm import SVC as PdSVC
>>> from ibex.sklearn.feature_selection import SelectKBest as PdSelectKBest
>>> from ibex.sklearn.decomposition import PCA as PdPCA

（当然，也可以从sklearn导入步骤，并与ibex.sklearn的步骤一起使用）

最后，我们构造了一个管道，给定DataFrame个特性：

水平地将一个2分量pcaDataFrame和最佳功能DataFrame连接到结果DataFrame

然后，将结果传递给输出pandas序列的支持向量机分类器：

>>> clf = PdPCA(n_components=2) + PdSelectKBest(k=1) | PdSVC(kernel="linear")

clf现在是一个pandas软件分类器，但在其他方面可以像所有sklearn估计器一样使用。例如，

>>> param_grid = dict(
...     featureunion__pca__n_components=[1, 2, 3],
...     featureunion__selectkbest__k=[1, 2],
...     svc__C=[0.1, 1, 10])
>>> try:
...     from ibex.sklearn.model_selection import GridSearchCV as PdGridSearchCV
... except: # Accomodate older versions of sklearn
...     from ibex.sklearn.grid_search import GridSearchCV as PdGridSearchCV
>>> PdGridSearchCV(clf, param_grid=param_grid).fit(iris[features], iris['class']) # doctest: +SKIP
...

那么这会给原始版本增加什么呢？

估计器对输入和输出执行verification and processing。它们在调用fit之后验证列名，并根据这些输入对结果进行索引。这有助于捕捉虫子。

结果更易于解释：

>>> svc = PdSVC(kernel="linear", probability=True)

Find the coefficients of the boundaries between the different classes:

>>> svc.fit(iris[features], iris['class']).coef_
            sepal length (cm)  sepal width (cm)  petal length (cm)  \
setosa              -0.046259          0.521183          -1.003045
versicolor          -0.007223          0.178941          -0.538365
virginica            0.595498          0.973900          -2.031000
<BLANKLINE>
            petal width (cm)
setosa             -0.464130
versicolor         -0.292393
virginica          -2.006303

Predict belonging to classes:

>>> svc.fit(iris[features], iris['class']).predict_proba(iris[features])
    setosa  versicolor  virginica
0    0.97...    0.01...   0.00...
...

Find the coefficients of the boundaries between the different classes in a pipeline:

>>> clf = PdPCA(n_components=2) + PdSelectKBest(k=1) | svc
>>> clf = clf.fit(iris[features], iris['class'])
>>> svc.coef_
                pca                 selectkbest
            comp_0    comp_1 petal length (cm)
setosa     -0.757016  ...0.376680         -0.575197
versicolor -0.351218  ...0.141699         -0.317562
virginica  -1.529320  ...1.472771         -1.509391

它允许writinfitg Pandas-munging estimators（另请参见Multiple-Row Features In The Movielens Dataset）。
使用DataFrame元数据，它允许编写更复杂的元学习算法，例如堆叠和嵌套标记和分层交叉验证。
管道语法简洁明了（请参见Motivation For Shorter Combinations）。

欢迎加入QQ群-->： 979659372

ibex 0.1.3

ibex的Python项目详细描述

tl；dr

推荐PyPI第三方库

usb-quartermaster-ssh

PanACoTA

pythonpook

canal-event

bitbnsp

djangoredissessionsfork

distributions-adeola

sv-distributions

tpList

vupload

rpaframework-http

django-3-jet-zupit

datahub-core

NesterByKasinath

hermit

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ibex 0.1.3

ibex的Python项目详细描述

tl；dr

推荐PyPI第三方库

usb-quartermaster-ssh

PanACoTA

pythonpook

canal-event

bitbnsp

djangoredissessionsfork

distributions-adeola

sv-distributions

tpList

vupload

rpaframework-http

django-3-jet-zupit

datahub-core

NesterByKasinath

hermit

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签