计划、设计和建立列车和测试矩阵
matrix-architect的Python项目详细描述
版权所有©2017。芝加哥大学(“芝加哥”)。保留所有权利。
特此授予使用、复制、修改和分发本软件(包括所有目标代码和源代码)以及任何随附文档(统称为“程序”)用于教育和非营利研究目的的许可,无需付费,且无需签署许可协议,前提是上述版权通知,本段和以下三段将出现在所有副本、修改和分发中。为免生疑问,出于教育和非营利研究目的,不包括使用本计划的任何服务或销售服务的一部分。要获得该项目的商业许可证,请联系芝加哥大学波尔斯基创业与创新中心技术商业化和许可,地址:伊利诺伊州芝加哥市东53街1452号2楼,邮编:60615。
由芝加哥大学数据科学与公共政策部创建
这个节目由芝加哥版权所有。该计划是“按原样”提供的,没有芝加哥的任何伴随服务。芝加哥不保证程序的运行是不间断的或无错误的。最终用户理解,该计划是为研究目的而制定的,因此建议不要以任何理由完全依赖该计划。
在任何情况下,芝加哥都不应对任何一方承担直接、间接、特别、偶发或后果性损害,包括因使用该程序而造成的损失,即使芝加哥已被告知有可能发生此类损害。芝加哥特别否认任何保证,包括但不限于,适销性和适合特定目的的默示保证。以下提供的程序按“原样”提供。芝加哥没有义务提供维护、支持、更新、增强或修改。描述:建筑师
Plan, design, and build train and test matrices
[![Build Status](https://travis-ci.org/dssg/architect.svg?branch=master)](https://travis-ci.org/dssg/architect) [![codecov](https://codecov.io/gh/dssg/architect/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/architect) [![codeclimate](https://codeclimate.com/github/dssg/architect.png)](https://codeclimate.com/github/dssg/architect)
In order to run classification algorithms on source data, this data must be properly organized into design matrices. Converting cleaned data into these matrices is not a trivial task; the process of creating the needed features and labels for an experiment from source data can be complicated, creating the matrices themselves out of features and labels can be inefficient, and there is opportunity at each step to leak data backwards in time to give model trained on a matrix an unfair advantage.
The Architect addresses these issues with functionality aimed at all tasks between cleaned source data (in a PostgreSQL database) and design matrices.
## Components
- [LabelGenerator](architect/label_generators.py): Create binary labels suitable for a design matrix by querying a database table containing outcome events.
- [FeatureGenerator](architect/feature_generators.py): Create aggregate features suitable for a design matrix from a set of database tables containing events. Uses [collate](https://github.com/dssg/collate/) to build aggregation SQL queries.
- [FeatureGroupCreator](architect/feature_group_creator.py), [FeatureGroupMixer](architect/feature_group_mixer.py): Create groupings of features, and mix them using different strategies (like ‘leave one out’) to test their effectiveness.
- [Planner](architect/planner.py), [Builder](architect/builders.py): Build all design matrices needed for an experiment, taking into account different labels, state configurations, and feature groups.
In addition to being usable individually to assist in different aspects of building matrices in your project, the Architect components are integrated in [triage](https://github.com/dssg/triage) as a part of an entire modeling experiment that incorporates later tasks like model training and testing.
## Distributing, Building & Testing
The Architect is a Python package distributable via setuptools. It may be installed directly using easy_install or pip, or listed as a dependency of another package (namely triage), under the package name matrix-architect.
To build this package for development, its dependencies may be installed using pip:
pip install -r requirements_dev.txt(或者,在没有测试和开发依赖关系的情况下,使用requirements.txt)。
并且,为开发而构建,运行测试:
pytest
平台:未知 分类器:开发状态::2-pre-alpha 分类器:目标受众::开发人员 分类器:自然语言:英语 分类器:编程语言::python::3.4