大熊猫的sqldf


pysqldf的Python项目详细描述

Build StatusPyPI VersionPyPI Monthly DownloadsPyPI License

pysqldf允许您使用sql语法查询pandas数据帧。 它的工作方式类似于r中的sqldfpysqldf试图提供 更熟悉的处理和清理数据的方法 python或pandas

安装

$ pip install pysqldf

基本知识

pysqldf中的主类是SQLDFSQLDF接受1个环境 构造函数中的变量集或多个参数。-一套 会话/环境变量(有效值字典,locals()globals())-临时文件类型-用户定义函数-用户 定义的聚合函数

pysqldf使用SQLite syntax。 任何可转换为pandasdataframes的数据都将自动 由pysqldf检测。您可以像查询任何常规sql一样查询它们 桌子。

$ python
>>> from pysqldf import SQLDF, load_meat, load_births
>>> sqldf = SQLDF(globals())
>>> meat = load_meat()
>>> births = load_births()
>>> print sqldf.execute("SELECT * FROM meat LIMIT 10;").head()
                  date  beef  veal  pork  lamb_and_mutton broilers other_chicken turkey
0  1944-01-01 00:00:00   751    85  1280               89     None          None   None
1  1944-02-01 00:00:00   713    77  1169               72     None          None   None
2  1944-03-01 00:00:00   741    90  1128               75     None          None   None
3  1944-04-01 00:00:00   650    89   978               66     None          None   None
4  1944-05-01 00:00:00   681   106  1029               78     None          None   None

>>> q = "SELECT m.date, m.beef, b.births FROM meat m INNER JOIN births b ON m.date = b.date;"
>>> print sqldf.execute(q).head()
                    date    beef  births
403  2012-07-01 00:00:00  2200.8  368450
404  2012-08-01 00:00:00  2367.5  359554
405  2012-09-01 00:00:00  2016.0  361922
406  2012-10-01 00:00:00  2343.7  347625
407  2012-11-01 00:00:00  2206.6  320195

>>> q = "SELECT strftime('%Y', date) AS year, SUM(beef) AS beef_total FROM meat GROUP BY year;"
>>> print sqldf.execute(q).head()
   year  beef_total
0  1944        8801
1  1945        9936
2  1946        9010
3  1947       10096
4  1948        8766

用户定义函数和用户定义聚合函数 支持。

$ python
>>> from pysqldf import SQLDF, load_iris
>>> import math
>>> import numpy
>>> ceil = lambda x: math.ceil(x)
>>> udfs = { "ceil": lambda x: math.ceil(x) }
>>> udafs = { "variance": lambda values: numpy.var(values) }
>>> # or you can also define aggregation function as class
>>> # class variance(object):
... #     def __init__(self):
... #         self.a = []
... #     def step(self, x):
... #         self.a.append(x)
... #     def finalize(self):
... #         return numpy.var(self.a)
...
>>> # udafs={ "variance": variance }
>>> iris = load_iris()
>>> sqldf = SQLDF(globals(), udfs=udfs, udafs=udafs)
>>> sqldf.execute("""
    SELECT
        ceil(sepal_length) AS sepal_length,
        ceil(sepal_width) AS sepal_width,
        ceil(petal_length) AS petal_length,
        ceil(petal_width) AS petal_width,
        species
    FROM iris;
    """).head()
   sepal_length  sepal_width  petal_length  petal_width      species
0             6            4             2            1  Iris-setosa
1             5            3             2            1  Iris-setosa
2             5            4             2            1  Iris-setosa
3             5            4             2            1  Iris-setosa
4             5            4             2            1  Iris-setosa
>>> sqldf.execute("SELECT species, variance(sepal_width) AS var FROM iris GROUP BY species;")
           species       var
0      Iris-setosa  0.142276
1  Iris-versicolor  0.096500
2   Iris-virginica  0.101924

文档

SQLDF(env, inmemory=True, udfs={}, udafs={})

env:sql执行环境的变量映射字典。关键是 SQL变量名称和值是您的程序变量。locals()globals()用于简单赋值。

inmemory:sqlite数据库选项。

udfs:用户定义函数的字典。字典键是 函数名,字典值为函数。见sqlite3 document

udafs:用户定义聚合函数的字典。词典 键是函数名,字典值是聚合函数或类。 如果value是function,则function获取一个列列表参数 值,它应该返回聚合的值。另一种情况(值是 类),请参见sqlite3 document

load_iris()load_meat()load_births()

加载示例数据帧数据。

推荐PyPI第三方库


热门话题
nuclio aqua 源语言 synchrotron menpo alllower enthought attackoftheorcs mtb didata cleverbot list1 基类 sudo sct httpapi pseudorandom m1 viff updates 未记录 屈服 pesq 会话式 voodoo multilanguagefields loaded bustime freckle startpoint accel fastscore nudenet 视觉效果 gdb 用户注册 daftdog 驾驶室 xmms conferenceng br flasky celerite optimisation 标记识别 globocom treep deluge radicale 变色龙