用于快速模型生成和比较的机器学习包

modelcreator的Python项目详细描述


modelcreator-AutoML包

这个包包含一个Machine,它将为您完成学习。它可以自动地为给定的数据建立一个合适的预测模型。在

样本输出
Testing:  Gradient Boosting Classifier
[########################################] | 100% Completed |  3.9s
Score: 0.9667

Testing:  Ada Boost Classifier
[########################################] | 100% Completed |  1.3s
Score: 0.9600

Testing:  Random Forest Classifier
[########################################] | 100% Completed |  5.0s
Score: 0.9600

Testing:  Balanced Random Forest Classifier
[########################################] | 100% Completed |  3.5s
Score: 0.9600

Testing:  SVC
[########################################] | 100% Completed |  1.2s
Score: 0.9667

Chosen model:  Gradient Boosting Classifier 0.9667

Params:
        min_samples_split: 2
        n_estimators: 100

Results saved to  output.csv

目录

  1. Installation
  2. Usage
  3. Saving model
  4. Parameters
  5. Development

安装

要使用包运行:

^{pr2}$

使用

输入可以是csv文件或pandas DataFrame对象的路径。在

CSV路径输入

库假定训练数据集的最后一列包含预期结果。数据集(训练和预测)必须作为csv文件提供。在

如果results列包含文本,Machine将尽力学习如何正确地对数据进行classify。如果数字在里面,将执行回归。在

如果文件包含头,则应向方法添加header_in_csv=True参数。在

示例1Iris
frommodelcreatorimportMachine# Create automl machine instancemachine=Machine()# Train machine learning modelmachine.learn('example-data/iris.csv')# Predict the outcomesmachine.predict('example-data/iris-pred.csv','output.csv')

此示例也可在example.py文件中找到。考虑自己试试吧。在

熊猫输入

但是如果结果列不是给定csv中的最后一列,该怎么办?为了交换列而重写整个csv可能不太方便。由于这个问题,机器有learnFromDf和{}方法。方法名中的Df代表来自pandas模块的DataFrame。这样你就可以自己读取文件了。在

例2 Titanic
frommodelcreatorimportMachineimportpandasaspd# Create DataFrame object from filetrain=pd.read_csv("train.csv")# Get features columns from DataFrameX_train=train.drop(['Survived'],axis=1)# And labels (results) columny_train=train["Survived"].astype(str)# Create the instance of Machinemachine=Machine()# Train machine learning modelmachine.learnFromDf(X_train,y_train,computation_level='advanced')# Show parameters of the modelmachine.showParams()# Load test set from fileX_test=pd.read_csv("test.csv")# Predict the labelsresults=machine.predictFromDf(X_test)# Save results to a new fileresults.to_csv("results.csv")

简单吗?这是正确的!请注意,我们使用astype(str)将数据视为,而不是数字,因为上面示例中使用的Titanic dataset"Survived"列中有值01。在

保存模型

如果您希望您的模型避免对整个数据集进行重新学习,只需进行简单的预测,您可以将Machine的状态保存到一个文件中。在

# Save Machine with a trained model to "machine.pkl"machine.saveMachine('machine.pkl')# Create a new machine based on a schema filemachine2=Machine('machine.pkl')

参数

Machine可以根据用例进行定制。检查参数表:

机器
ParamTypeDefaultDescription
schemaNone or str^{}A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it.
学习 ^{tb2}$
从DF中学习
ParamTypeDefaultDescription
Xpandas.DataFrameDataFrame containing the feature columns.
ypandas.SeriesLabel columns of the training data.
metricsNone, str or Callable^{} or ^{}Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbosebool^{}Whether to print learning logs.
cvint^{}A number of cross-validation subsets. Higher values may increase computation time.
computation_levelstr^{}Can be either ^{}, ^{} or ^{}. With higher computation level more models and parameters are being tested.
预测
ParamTypeDefaultDescription
features_filestrPath to the features csv of the data to generate predictions on.
header_in_csvbool^{}Whether the csv file contains headers in the first row.
output_filestr^{}Path to the output csv file. In this file, the predictions will be saved.
verbosestr^{}Whether to print logs.
预测DF
ParamTypeDefaultDescription
X_predictionspandas.DataFrameFeatures columns to generate predictions on.
output_filestr^{}Predict method returns pandas.Series of the results. Additionally, it can also save the results to a csv file. It can be specified here. If the path is other than ^{} it will be interpreted as a path to the output file.
verbosestr^{}Whether to print logs.
存储机器
ParamTypeDefaultDescription
output_file_namestr^{}Path to where shall the Machine instance be saved.

开发

有什么特色创意还是只想帮忙?看看issues tab!在

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Apache Flink外部Jar   创建和强制转换对象数组时发生java错误   Java,添加数组   具有相同包结构和类的java JAR   java Jenkins未能构建Maven项目   java为什么一个forloop比另一个更快,尽管它们做的“一样”?   servlets在将“/”站点迁移到Java EE包时处理contextpath引用   无法解析java MavReplugin:2.21或其某个依赖项   泛型如何编写比较器来泛化Java中的两种类型的对象?   java Android Emulator未在netbeans上加载   多线程Java使用线程对数组中的数字求和:在同步块中使用新变量作为锁:差异   java如何在JSP/servlet中设置<input>标记的值?