Python gaplearn包_程序模块 - PyPI

弥合其他机器学习和深度学习工具之间的差距，使可靠的事后分析成为可能。

gaplearn的Python项目详细描述

gaplearn

gaplearn在其他机器学习和深度学习工具之间架起了桥梁。所有模型都可以传递到下面的函数，而不管它们是基于什么框架构建的（scikit learn、tensorflow、xgboost，甚至是好的ole numpy）。

我的第一个软件包目标是通过使对超参数和特征选择过程的健壮的死后分析成为可能，使经常是黑箱型的模型训练过程透明化。下面的功能还可以进一步自动化一些过程，同时让用户完全控制结果。

许多功能正在开发中。不久将添加单元测试和完整文档。

源代码可以在github上找到

安装

pip install gaplearn

请参阅最新版本的

子模块

CV

cv子模块将具有这些类（sfs已发布）：

SFS

说明：

这是一个顺序特征选择器，使您能够对任何模型执行向后消除（而不仅仅是线性回归）。
在每个步骤中，选择具有最低置换重要性的特征进行移除。默认情况下，排列重要性的度量是准确性的降低，但用户可以通过任何自定义评分功能。

路上的改进：

添加正向选择和所有子集测试
为评估功能排列的重要性添加更多内置评分功能
创建自定义排列评分方法以删除作为依赖项的eli5

方法

向后消除（x，y，model，params={}，fit_function=none，predict_function=none，score_function=none，score_name='score'，cols=[]，verbose=0）

反向消除
参数：
- x:（数据帧或矩阵）具有自变量
- y:（iterable）因变量对应值
- params:（dict）模型的参数集
- 模型：需要培训和评估的模型体系结构
- 拟合函数：（函数）用于训练模型的函数；函数必须接受参数模型、x和y；如果未设置此值，向后消除将尝试使用模型的拟合方法
- predict_function:（函数）将用于对模型进行预测的函数；函数必须接受参数model和x；如果未设置此值，向后消除将尝试使用模型的predict方法。OD
- score_函数：（函数）用于对模型进行评分并确定特征排列重要性的函数；函数必须接受参数y和preds；如果未设置此值，则将使用精度
- score_name:（str）由score_函数计算的分数的名称；"score"默认值
- cols：（list）矩阵或数据框中列的名称
- 详细：（0、1或2）确定打印量

获取摘要

逐步得到结果的摘要；这可以用于更加可靠和全面地确定哪个功能集最适合您的问题

获取结果

在每个步骤中获取每个观察的结果；这可用于更可靠和全面地确定哪个功能集最适合您的问题

获取功能

获取序列特征选择中使用的所有特征检查< /李>

按分数设置（最小分数=无，步数=全部）

获取n个特征集（n由num_steps确定），这些特征集的得分大于或等于min_score
参数：
min_score:（float或int）要返回特征集必须达到的最小得分
num_steps:（str或int）要返回的功能集数；"all"是唯一有效的str选项

按功能获取设置（num-features，max-features=none）

获取具有num-features功能的所有功能集；如果设置了max-features，则返回num-features和max-features功能之间的功能集
参数：
num_features:（int）返回的功能集应该具有的功能数；如果设置了max_features则函数是最小值
max_features:（int）返回的功能集应具有的最大功能数

示例1：

#### Perform a backwards elimination with sci-kit learn's random forest model ####importpandasaspdfromgaplearn.cvimportSFSX=pd.read_csv('X_classification.csv')y=pd.read_csv('y_classification.csv')fs=SFS()print('The backwards elimination has been run: {}'.format(fs.be_complete))# prints Falsefromsklearn.ensembleimportRandomForestClassifierrfc=RandomForestClassifier()# Run the backwards eliminationfs.backwards_elimination(X,y,model=rfc,params={'n_jobs':-1})# Get the step-by-step summarysummary=fs.get_summary_be()# Alternatively, `summary = fs.summary_be`# Get the predictions and true values for each observationresults=fs.get_results_be()# Alternatively, `results = fs.results_be# Get the features used in the analysisfeatures=fs.features_be# Alternatives, `sorted(list(results['feature to remove']))# Identify which feature set can achieve at least 85% accuracy with the smallest number of featuresat_least_85=fs.get_set_by_score_be(min_score=.85,num_steps=1)# Identify the best model with only 4 featuresfeatures_4=fs.get_set_by_features_be(num_features=4)

示例2：

#### Perform a more complex backwards elimination with sci-kit learn's naive bayes model ####importpandasaspdfromgaplearn.cvimportSFSfromsklearn.linear_modelimportSGDRegressormodel_sgd=SGDRegressor(loss='modified_huber',penalty='elasticnet')X=pd.read_csv('X_regression.csv')y=pd.read_csv('y_regression.csv')fs=SFS()# Define a score_functiondefmse(y,preds):score=sum([(preds[i]-y[i])**2foriinrange(y.shape[0])])/y.shape[0]returnscore# Define a predict_functiondefarbitrary_prediction(model,X):preds=model.predict(X)+1# arbitrarily deciding to add 1 to the prediction (realistically, this would be a wrapper for model that don't have a `fit` method)returnpreds# Define a predict_functiondefpredict_w_proba(model,X):preds=[1ifx[1]>0.6else0forxinmodel.predict_proba(X)]returnpreds# Run the backwards eliminationfs.backwards_elimination(X,y,model=model_sgd,predict_function=predict_w_proba,score_function=mse)# Get the step-by-step summarysummary=fs.get_summary_be()# Alternatively, `summary = fs.summary_be`# Get the predictions and true values for each observationresults=fs.get_results_be()# Alternatively, `results = fs.results_be`# Get the features used in the analysisfeatures=fs.features_be# Alternatively, `sorted(list(results['feature to remove']))`# Identify which two feature sets can achieve at least 85% accuracy with the smallest number of featuresat_least_85=fs.get_set_by_score_be(min_score=.85,num_steps=2)# Identify the best models with 3-5 featuresfeatures_3_5=fs.get_set_by_features_be(num_features=3,max_features=5)

搜索群集

说明：

这是对聚类算法的超参数网格/随机搜索
与其他网格/随机搜索算法不同，此算法使您能够通过每个参数集的观察结果获取观察结果，以便对网格/随机搜索进行深入的事后分析。

文档： 马上就来。有关详细信息，请参见文档：https://github.com/awhedon/gaplearn" rel="nofollow">https://github.com/awhedon/gaplearn

方法
搜索聚类（model，params，x，cols=none，fit_function=none，label_function=none，centroid_function=none，score_function=none，metric_name=score，random=none，centroid=false，verbose=0）
运行网格/随机超参数搜索
参数：
模型：需要培训和评估的模型体系结构
params:（dict）hypterparameter网格（例如：{'c'：[0.1，0.5，0.9]，…}）
X:（数据帧或矩阵）用于群集的功能
cols：（list）矩阵或数据框中列的名称
fit_函数：（函数）用于训练模型的函数；函数必须接受参数model，params和x；如果未设置此值，则search_cluster将尝试使用模型的fit方法
label_function：（function）将使用经过训练的模型将标签附加到未显示的数据的函数；该函数必须接受参数model；如果未设置此值，则search_cluster将尝试提取模型的label s属性值
质心函数：（函数）从训练模型中提取簇质心的函数；函数必须接受参数model；如果未设置此值，搜索簇将尝试提取模型的簇中心属性。E值
score_function：（function）对训练模型的结果进行评分的函数；函数必须接受参数x和标签；如果未设置此值，则搜索_cluster将计算轮廓评分
metric_name:（str）评估度量的名称；"score"是默认值
random:（int或float）如果int，则为要使用的参数集的最大数目；如果float，则为要使用的参数集的百分比（.7表示70%的可用参数集将用于测试）；默认值为none，表示将使用所有参数集
质心：（bool）如果模型是质心based，您可以将其设置为true并定义一个形心函数（如果模型具有簇中心属性，则将其保留为none），以便提取簇中心
verbose:（0或1）确定打印量
获取最佳模式（返回分数=假）
获取"最佳"模型（n=1，return_scores=false）
获取最佳参数
获取标签（）
获取参数结果（）
示例1：
fromgaplearn.cvimportSearchClusterimportpandasaspdfromsklearnimportKMeanssc=SearchCluster()X=pd.read_csv('X_clustering.csv')params={'':,'':[],'':[]}# Run the hyperparameter grid searchsc.search_cluster(model=KMeans,params=params,X=X,centroid=True)# Get the best modelbest_model=sc.get_best_model()# Alternatively, `best_model = sc.best_model`# Get the five best models and their performancebest_models_5,best_models_5_scores=sc.get_best_models(n=5,return_scores=True)# Get the model params that performed bestbest_params=sc.get_best_params()# Alternatively, `best_params = sc.best_params`# Get a summary of the results for each parameter setparam_results=sc.get_param_results()# Alternatively, `param_results = sc.param_results`# Get the labels for each observation for each parameter set's k-fold validation to robustly analyze the differences in performancelabels=sc.get_labels()# Alternatively, `labels = sc.labels_df`
示例2：
fromgaplearn.cvimportSearchClusterimportpandasaspdimportrandomfrommy_moduleimportModelClasssc=SearchCluster()X=pd.read_csv('X_clustering.csv')params={'param1':['value1','value2','value3'],'param2':['value1','value2','value3'],'param3':['value1','value2','value3']}deffit_function(model,params,X):model(**params).fit_my_model(X)returnmodeldefscore_function(X,labels):score=random.random()returnscore# Run the hyperparameter random search, testing 50% of the parameter setssc.search_cluster(model=KMeans,params=params,X=X,fit_function=fit_function,score_function=score_function,random=0.5)# Get the best model and resultsbest_model,best_model_scores=sc.get_best_model(return_scores=True)# Get the five best models and their performancebest_models_5,best_models_5_scores=sc.get_best_models(n=5,return_scores=True)# Get the model params that performed bestbest_params=sc.get_best_params()# Alternatively, `best_params = sc.best_params`# Get a summary of the results for each parameter setparam_results=sc.get_param_results()# Alternatively, `param_results = sc.param_results`# Get the labels for each observation for each parameter set's k-fold validation to robustly analyze the differences in performancelabels=sc.get_labels()# Alternatively, `labels = sc.labels_df`
搜索（正在开发中）
说明：
这是回归算法和分类算法的超参数网格/随机搜索
与其他网格/随机搜索算法不同，此算法使您能够通过每个参数集的观察结果获取观察结果，以便对网格/随机搜索进行深入的事后分析。
数据引擎
数据工程子模块将具有以下类：
分布式SQL（开发中）
说明：
这允许用户将多参数SQL查询分块并在多个线程上处理它们。
标签：
工具
the
函数
机器
model
best
score
features
差距
欢迎加入QQ群-->： 979659372
推荐PyPI第三方库
sciunit 用于根据数据正式验证科学模型的测试驱动框架。
odoo10-addons-oca-l10n-taiwan 【中文解释】：Oca-L10N-Taiwan Odo Addons的目标包
bh100 生成市场上顶级加密的json文件，并在哪里购买它们。
sysstat 用于收集PC使用统计信息的脚本
pyswitcherv2 通过python控制交换机v2热水器
learncryp 用python实现和解释密码算法
metadata_toolbox 管理语料库及其元数据的工具箱
nondjango-storages nondjango存储-因为api很好，但对django的依赖性不是。
pyMSAScoring 使用python对多个序列对齐进行评分
azuremgmtiothub 防止利用的软件包
prexview 使用prexview的python包使用json或xml数据为可编程的html、pdf、png或jpg生成提供快速、可伸缩和友好的服务。
GarNet 没有项目描述
agents 强化学习算法的有效tensorflow实现。
threadlocal-aws 用于访问线程本地aws客户端和资源的库
ghostbot 用户界面测试框架

导航栏项目描述版本历史下载文件项目链接首页标签许可证: BSD许可证（BSD 3条款）作者信息:: 暂无维护者 awhedon 最新PyPI项目 italian_vip_says UFx vofs fake_item_generator NerEva django-monologue fio_product_attribute_strict climailsystem pyshape tbb-devel npy-append-arra anthill.tal.macrorenderer odoo11-addon-stock-a uuuu contextil fyl_nester appomatic_renderable teacher chuletas slackbot_ce 最新Python常见问题如何实现一个类，该类在每次更改其属性时更改其“last_edited”变量？如何实现一个类？如何实现一个类的属性设置？如何实现一个能够存储输入并反复访问输入的存储系统？GPA计算器如何实现一个自定义的keras层，它只保留前n个值，其余的都归零？如何实现一个行为类似于Python中序列的最小类？如何实现一个请求的多线程或多处理如何实现一个长时间运行的、事件驱动的python程序？如何实现一个颜色一致的非舔深度地图实时？如何实现一个默认的SQLAlchemy模型类，它包含用于继承的公共CRUD方法？如何实现一次热编码的生成函数如何实现一种在数组中删除对的方法如何实现一类支持向量机用于图像异常检测如何实现一维阵列到二维阵列的复制转换如何实现三维三次样条插值？

gaplearn 0.11

gaplearn的Python项目详细描述

gaplearn

安装

子模块

CV

SFS

搜索群集

`搜索（正在开发中）`

`数据引擎`

`分布式SQL（开发中）`

`推荐PyPI第三方库`

sciunit

odoo10-addons-oca-l10n-taiwan

bh100

sysstat

pyswitcherv2

learncryp

metadata_toolbox

nondjango-storages

pyMSAScoring

azuremgmtiothub

prexview

GarNet

agents

threadlocal-aws

ghostbot

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

gaplearn 0.11

gaplearn的Python项目详细描述

gaplearn

安装

子模块

CV

SFS

搜索群集

搜索（正在开发中）

数据引擎

分布式SQL（开发中）

推荐PyPI第三方库

sciunit

odoo10-addons-oca-l10n-taiwan

bh100

sysstat

pyswitcherv2

learncryp

metadata_toolbox

nondjango-storages

pyMSAScoring

azuremgmtiothub

prexview

GarNet

agents

threadlocal-aws

ghostbot

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

`搜索（正在开发中）`

`数据引擎`

`分布式SQL（开发中）`

`推荐PyPI第三方库`

导航栏

项目链接

标签