Python collate包_程序模块 - PyPI

聚合特征生成变得容易。

collate的Python项目详细描述

===
整理
==

…图像：：https://img.shields.io/pypi/v/collate.svg
：目标：https://pypi.python.org/pypi/collate

…图片：：https://travis-ci.org/dssg/collate.svg？branch=master
：目标：https://travis ci.org/dssg/collate

…图片：：https://readthedocs.org/projects/collate/badge/？version=latest
：目标：https://collate.readthedocs.io/en/latest/？徽章=最新
：alt：文档状态

…图像：：https://pyup.io/repos/github/dssg/collate/shield.svg
：目标：https://pyup.io/repos/github/dssg/collate/
：alt:updates

图片：：https://codecov.io/gh/dssg/collate/branch/master/graph/badge.svg
：目标：https://codecov.io/gh/dssg/collate
：alt:代码覆盖率

聚合功能生成变得简单。

*非商业用途的免费软件：`uchicago开源许可证<；https://github.com/dssg/collate/blob/master/license>；`.
*文档：https://collate.readthedocs.io.

overview
==

输入
==

以"芝加哥市食品检验数据"为例<；https://data.city of chicago.org/health human services/food inspections/4ijn-s7e5>；`。这张表看起来像这样：

====================================================================================================================================================================================================
1966765 8027360636 2016-10-18禁止进入……
1966314 2092894 60640 2016-10-11通行证……更正……
1966286 2215628 60661 2016-10-11通行证w/c…………危险……
1966220 2424039 60620 2016-10-07通行证。.
数据中有两个空间级别：特定餐厅（按其许可证号）和邮政编码。还有一个日期。

在raw sql中，可以对每个餐厅计算此值，如下所示：

ts='fail'）：：int"}，"sum"，{'coltype'：'aggregate'，'all'：{'type'：'mean'}）

要执行的功能。第三个参数为如何处理结果字段中空值的插补提供了一组规则。

例如，除了
总数之外，您还可能对导致失败的检查的比例感兴趣。这很容易用"failed"的平均值指定
计算：：

聚合（{"failed"："（results='fail'）：：int"}，["sum"，"avg"]，{'coltype'：'aggregate'，'all'：{'type'：'mean'}）

collate中的聚合很容易跨不同的时空组聚合此单个功能，例如：：

st=spacetimeaggregation（[失败]，
来自"食品检查"，
组=["许可证编号"，"邮政编码]，
间隔={"许可证编号"：["2年"，"3年"]，"邮政编码"：["1年"]}，
日期=["2016-01-01"，"2015-01-01"]，
日期列="检查日期"，
状态表="所有餐厅"，
状态组="许可证号"，
模式="测试对比"

"spacetimeaggregation"对象封装查询的"from"部分（在本例中，它只是检查表），以及"group by"列。这不仅会创建有关各个餐厅的信息
（按"许可证"分组），还会创建"邻居"列，添加有关餐厅所在区域的信息（按
"zip"分组）。这里指定的"state_table"应该包含一组完整的
``state_group``实体和应该为它们生成输出的日期，
不管它们是否存在于"from_obj``中。

``spacetimeaggregation``对象提供。它将在
中创建多个查询，以便创建过去1年、2年或3年的汇总统计数据，从2015年1月1日或2016年1月1日开始查找
。使用sqlalchemy engine对象执行这组查询：

st.execute（engine.connect（））将在"test戆collate"架构中创建四个新表。表
``食品检验许可证编号``将为每个
许可证包含四个特征列，描述过去两到三年内失败的总数和比例，日期列说明2016年或2015年之前的失败情况。类似地，"food_inspections_zip"表对于数据库中的每个邮政编码都有两个
功能列，查看日期列中日期
之前一年内该邻居的故障总数和
平均数。"食品检验汇总"表将这些结果合并在一起，以便更容易地查看任何给定餐厅的邻里和餐厅级别的影响。最后，``food_inspections_aggregation_imputed``
表使用``aggregate`
构造函数中指定的插补规则填写空值。

r/>"coltype"："aggregate"、
"all"：{"type"："mean"}、
"max"：{"type"："constant"、"value"：137}
}

重新应用。

字典的其他键是聚合
（如"sum"、"count"、"avg"等）或"all"作为catch all使用的归约函数。特定于函数的
规则将优先于"全部捕获"规则。与这些
键相关联的值都是一个字典，每个字典都有一个指定规则类型的必需"type"键和其他特定于规则的键。

当前可用的插补规则：
*`` mean``：特征的平均值（对于"spacetimeaggregation`"，取平均值在日期内）。
*``常量`：从所需的``值``参数中填充常量值。
*``零`：用零填充。
*``zero_noflag``：用零填充而不生成"估算"标志。此选项应仅用于显式知道空值为零的情况，例如事件表中没有实体表示没有发生此类事件。
*``空类别`：仅适用于分类功能。只需用空category列标记空值。
*`` binary_mode`：仅适用于聚合列类型。获取二进制功能的模式值。
*``error`：如果此功能遇到任何空值，则引发异常。

outputs
==

collate聚合的主输出是一个数据库表，其中所有聚合的功能都联接到实体列表中

>使用示例
====

>多个数量
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>多个函数
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
待办事项

日期~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~br/>

~~~~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
待办事项/>===
历史记录
=

>0.1.0
----

*初始版本。

欢迎加入QQ群-->： 979659372

collate 0.3.0

collate的Python项目详细描述

推荐PyPI第三方库

pronounceable

cmsplugin-template-placeholder

ghlocalapi

schedular

django-bitcoin

handsontablesjs

CoolBMPMover

greenstalk

python-metallum

fsai-data-sdk

python-sofa

redlock-dist

tweetscrape

encode-utils

oathldap-tool

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

collate 0.3.0

collate的Python项目详细描述

推荐PyPI第三方库

pronounceable

cmsplugin-template-placeholder

ghlocalapi

schedular

django-bitcoin

handsontablesjs

CoolBMPMover

greenstalk

python-metallum

fsai-data-sdk

python-sofa

redlock-dist

tweetscrape

encode-utils

oathldap-tool

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签