Python modelcreator包_程序模块 - PyPI

用于快速模型生成和比较的机器学习包

modelcreator的Python项目详细描述

modelcreator-AutoML包

这个包包含一个Machine，它将为您完成学习。它可以自动地为给定的数据建立一个合适的预测模型。在

样本输出

Testing:  Gradient Boosting Classifier
[########################################] | 100% Completed |  3.9s
Score: 0.9667

Testing:  Ada Boost Classifier
[########################################] | 100% Completed |  1.3s
Score: 0.9600

Testing:  Random Forest Classifier
[########################################] | 100% Completed |  5.0s
Score: 0.9600

Testing:  Balanced Random Forest Classifier
[########################################] | 100% Completed |  3.5s
Score: 0.9600

Testing:  SVC
[########################################] | 100% Completed |  1.2s
Score: 0.9667

Chosen model:  Gradient Boosting Classifier 0.9667

Params:
        min_samples_split: 2
        n_estimators: 100

Results saved to  output.csv

安装

要使用包运行：

^{pr2}$

使用

输入可以是csv文件或pandas DataFrame对象的路径。在

CSV路径输入

库假定训练数据集的最后一列包含预期结果。数据集（训练和预测）必须作为csv文件提供。在

如果results列包含文本，Machine将尽力学习如何正确地对数据进行classify。如果数字在里面，将执行回归。在

如果文件包含头，则应向方法添加header_in_csv=True参数。在

示例1Iris

frommodelcreatorimportMachine# Create automl machine instancemachine=Machine()# Train machine learning modelmachine.learn('example-data/iris.csv')# Predict the outcomesmachine.predict('example-data/iris-pred.csv','output.csv')

此示例也可在example.py文件中找到。考虑自己试试吧。在

熊猫输入

但是如果结果列不是给定csv中的最后一列，该怎么办？为了交换列而重写整个csv可能不太方便。由于这个问题，机器有learnFromDf和{}方法。方法名中的Df代表来自pandas模块的DataFrame。这样你就可以自己读取文件了。在

例2 Titanic

frommodelcreatorimportMachineimportpandasaspd# Create DataFrame object from filetrain=pd.read_csv("train.csv")# Get features columns from DataFrameX_train=train.drop(['Survived'],axis=1)# And labels (results) columny_train=train["Survived"].astype(str)# Create the instance of Machinemachine=Machine()# Train machine learning modelmachine.learnFromDf(X_train,y_train,computation_level='advanced')# Show parameters of the modelmachine.showParams()# Load test set from fileX_test=pd.read_csv("test.csv")# Predict the labelsresults=machine.predictFromDf(X_test)# Save results to a new fileresults.to_csv("results.csv")

简单吗？这是正确的！请注意，我们使用astype(str)将数据视为类，而不是数字，因为上面示例中使用的Titanic dataset在"Survived"列中有值0和1。在

保存模型

如果您希望您的模型避免对整个数据集进行重新学习，只需进行简单的预测，您可以将Machine的状态保存到一个文件中。在

# Save Machine with a trained model to "machine.pkl"machine.saveMachine('machine.pkl')# Create a new machine based on a schema filemachine2=Machine('machine.pkl')

参数

Machine可以根据用例进行定制。检查参数表：

机器

Param	Type	Default	Description
schema	None or str	^{}	A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it.

学习 ^{tb2}$

从DF中学习
Param Type Default Description
X pandas.DataFrame DataFrame containing the feature columns.
y pandas.Series Label columns of the training data.
metrics None, str or Callable ^{} or ^{} Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose bool ^{} Whether to print learning logs.
cv int ^{} A number of cross-validation subsets. Higher values may increase computation time.
computation_level str ^{} Can be either ^{}, ^{} or ^{}. With higher computation level more models and parameters are being tested.

Param	Type	Default	Description
X	pandas.DataFrame		DataFrame containing the feature columns.
y	pandas.Series		Label columns of the training data.
metrics	None, str or Callable	^{} or ^{}	Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose	bool	^{}	Whether to print learning logs.
cv	int	^{}	A number of cross-validation subsets. Higher values may increase computation time.
computation_level	str	^{}	Can be either ^{}, ^{} or ^{}. With higher computation level more models and parameters are being tested.

预测
Param Type Default Description
features_file str Path to the features csv of the data to generate predictions on.
header_in_csv bool ^{} Whether the csv file contains headers in the first row.
output_file str ^{} Path to the output csv file. In this file, the predictions will be saved.
verbose str ^{} Whether to print logs.

Param	Type	Default	Description
features_file	str		Path to the features csv of the data to generate predictions on.
header_in_csv	bool	^{}	Whether the csv file contains headers in the first row.
output_file	str	^{}	Path to the output csv file. In this file, the predictions will be saved.
verbose	str	^{}	Whether to print logs.

预测DF

Param	Type	Default	Description
X_predictions	pandas.DataFrame		Features columns to generate predictions on.
output_file	str	^{}	Predict method returns pandas.Series of the results. Additionally, it can also save the results to a csv file. It can be specified here. If the path is other than ^{} it will be interpreted as a path to the output file.
verbose	str	^{}	Whether to print logs.

存储机器

Param	Type	Default	Description
output_file_name	str	^{}	Path to where shall the Machine instance be saved.

开发

有什么特色创意还是只想帮忙？看看issues tab！在

欢迎加入QQ群-->： 979659372

modelcreator 0.9.3

modelcreator的Python项目详细描述

modelcreator-AutoML包

样本输出

目录

安装

使用

CSV路径输入

熊猫输入

保存模型

参数

机器

学习 ^{tb2}$

预测DF

存储机器

开发

推荐PyPI第三方库

ddtrace-graphql

mozleak

regaindapi

django-kvtags

aognet

sermon

borsdata-sdk

dolmen.sqlcontainer

vdom

timeslicer

fluentcms-contactform

CloeePy-RabbitMQ

generalrepytivit

Cython

python-reap

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

modelcreator 0.9.3

modelcreator的Python项目详细描述

modelcreator-AutoML包

样本输出

目录

安装

使用

CSV路径输入

熊猫输入

保存模型

参数

机器

学习 ^{tb2}$

预测DF

存储机器

开发

推荐PyPI第三方库

ddtrace-graphql

mozleak

regaindapi

django-kvtags

aognet

sermon

borsdata-sdk

dolmen.sqlcontainer

vdom

timeslicer

fluentcms-contactform

CloeePy-RabbitMQ

generalrepytivit

Cython

python-reap

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签