用于快速模型生成和比较的机器学习包
modelcreator的Python项目详细描述
modelcreator-AutoML包
这个包包含一个Machine,它将为您完成学习。它可以自动地为给定的数据建立一个合适的预测模型。在
样本输出
Testing: Gradient Boosting Classifier
[########################################] | 100% Completed | 3.9s
Score: 0.9667
Testing: Ada Boost Classifier
[########################################] | 100% Completed | 1.3s
Score: 0.9600
Testing: Random Forest Classifier
[########################################] | 100% Completed | 5.0s
Score: 0.9600
Testing: Balanced Random Forest Classifier
[########################################] | 100% Completed | 3.5s
Score: 0.9600
Testing: SVC
[########################################] | 100% Completed | 1.2s
Score: 0.9667
Chosen model: Gradient Boosting Classifier 0.9667
Params:
min_samples_split: 2
n_estimators: 100
Results saved to output.csv
目录
安装
要使用包运行:
^{pr2}$使用
输入可以是csv文件或pandas DataFrame对象的路径。在
CSV路径输入
库假定训练数据集的最后一列包含预期结果。数据集(训练和预测)必须作为csv文件提供。在
如果results列包含文本,Machine将尽力学习如何正确地对数据进行classify。如果数字在里面,将执行回归。在
如果文件包含头,则应向方法添加header_in_csv=True
参数。在
frommodelcreatorimportMachine# Create automl machine instancemachine=Machine()# Train machine learning modelmachine.learn('example-data/iris.csv')# Predict the outcomesmachine.predict('example-data/iris-pred.csv','output.csv')
此示例也可在example.py
文件中找到。考虑自己试试吧。在
熊猫输入
但是如果结果列不是给定csv中的最后一列,该怎么办?为了交换列而重写整个csv可能不太方便。由于这个问题,机器有learnFromDf
和{
frommodelcreatorimportMachineimportpandasaspd# Create DataFrame object from filetrain=pd.read_csv("train.csv")# Get features columns from DataFrameX_train=train.drop(['Survived'],axis=1)# And labels (results) columny_train=train["Survived"].astype(str)# Create the instance of Machinemachine=Machine()# Train machine learning modelmachine.learnFromDf(X_train,y_train,computation_level='advanced')# Show parameters of the modelmachine.showParams()# Load test set from fileX_test=pd.read_csv("test.csv")# Predict the labelsresults=machine.predictFromDf(X_test)# Save results to a new fileresults.to_csv("results.csv")
简单吗?这是正确的!请注意,我们使用astype(str)
将数据视为类,而不是数字,因为上面示例中使用的Titanic dataset在"Survived"
列中有值0和1。在
保存模型
如果您希望您的模型避免对整个数据集进行重新学习,只需进行简单的预测,您可以将Machine的状态保存到一个文件中。在
# Save Machine with a trained model to "machine.pkl"machine.saveMachine('machine.pkl')# Create a new machine based on a schema filemachine2=Machine('machine.pkl')
参数
Machine可以根据用例进行定制。检查参数表:
机器
Param | Type | Default | Description |
---|---|---|---|
schema | None or str | ^{ | A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it. |
学习
^{tb2}$
从DF中学习
Param | Type | Default | Description |
---|---|---|---|
X | pandas.DataFrame | DataFrame containing the feature columns. | |
y | pandas.Series | Label columns of the training data. | |
metrics | None, str or Callable | ^{ | Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions. |
verbose | bool | ^{ | Whether to print learning logs. |
cv | int | ^{ | A number of cross-validation subsets. Higher values may increase computation time. |
computation_level | str | ^{ | Can be either ^{ |
预测
Param | Type | Default | Description |
---|---|---|---|
features_file | str | Path to the features csv of the data to generate predictions on. | |
header_in_csv | bool | ^{ | Whether the csv file contains headers in the first row. |
output_file | str | ^{ | Path to the output csv file. In this file, the predictions will be saved. |
verbose | str | ^{ | Whether to print logs. |
预测DF
Param | Type | Default | Description |
---|---|---|---|
X_predictions | pandas.DataFrame | Features columns to generate predictions on. | |
output_file | str | ^{ | Predict method returns pandas.Series of the results. Additionally, it can also save the results to a csv file. It can be specified here. If the path is other than ^{ |
verbose | str | ^{ | Whether to print logs. |
存储机器
Param | Type | Default | Description |
---|---|---|---|
output_file_name | str | ^{ | Path to where shall the Machine instance be saved. |
开发
有什么特色创意还是只想帮忙?看看issues tab!在
- 项目
标签: