Python data-depgraph包_程序模块 - PyPI

科学数据集的微依赖实现库

data-depgraph的Python项目详细描述

depgraph是一个用于表示网络的小型（<；500 loc）python库数据集及其关系。这样，表面上类似于Airflow和 Luigi，尽管那些工具包含更多的功能。

网络是根据关系（图边）来声明的在源和目标数据集（图形节点）之间。目标数据集可以然后按正确顺序报告前兆数据集集。这使得把构建脚本和构建依赖项放在一起很简单，顺序的或并行的。

传统上，每个Dataset都被设计为对应于一个文件。一个 DatasetGroup类处理可以有多个文件的情况考虑单个文件（例如二进制数据文件及其XML元数据）。

Different kinds of resources, such as database tables, can be used as long as they can be queried to determine whether they exist (how how old they are, in order to tak advantage of age-based incremental building).

当Dataset需要构建不同的数据集以满足它的依赖性，它提供了一个原因，例如：

缺少Dataset，因此必须构建
Dataset已过期

depgraph是一个可重用的组件，用于构造科学数据集构建工具。对于这样一个构建工具必须是：

许可证reproducible analysis
记录以便a workflow can be easily reported
执行快速重建以启用实验

在标准库之外，depgraph没有自己的依赖项，因此，在运行在笔记本电脑上的项目中，在大型群集，或在云中。depgraph支持现代python 实现（Python2，Python3，Pypy），并在Linux，OS X上工作，还有窗户。

重要零件

Dataset定义单个数据产品，由文件名，name。可以按顺序提供其他关键字参数以促进构建过程。

数据集的祖先可以使用Dataset.parents(n)检索，其中n是要包含的生成数。n=0表示包含只有直系父母，而n=1包括祖父母。n=-1 包括每个祖先。Dataset.roots()返回顶级祖先，即没有额外父母的人。

类似地，Dataset.children(n)产生数据集的后代，如果有的话

关系用Dataset.dependson(obj)定义，其中obj 是另一个Dataset实例。可以定义关系以编程方式构造大型依赖关系图。

用户定义的build(dataset, reason)函数（名称不重要）获取数据集并基于其祖先和任何其他 Dataset的属性。原因是一个指定生成步骤的动机。

depgraph.buildall()函数或Dataset.buildnext()方法可用于获取要馈送到 build()函数。或者，build()函数可以是用buildmanagerdecorator装饰，它创建一个函数通过组合数据集的依赖项自动构造数据集的按顺序（参见下面的示例）。

使用^{tt20}可以可视化复杂的依赖关系图$ 函数，返回DOT language字符串对可视图形进行编码。

示例

声明一组类似下图的依赖项：

R0      R1      R2      R3         [raw data]
  \     /       |       |
    DA0         DA1    /
        \      /  \   /
           DB0     DB1
            \     / |  \
             \   /  |   \
              DC0  DC1  DC2        [products]

fromdepgraphimportDataset,buildmanager# Define Datasets# Use an optional keyword `tool` to provide a key instructing our build tool# how to assemble this product. Here we've used strings, but another pattern# would be to provide a callback functionR0=Dataset("data/raw0",tool="read_csv")R1=Dataset("data/raw1",tool="read_csv")R2=Dataset("data/raw2",tool="database_query")R3=Dataset("data/raw3",tool="read_hdf")DA0=Dataset("step1/da0",tool="merge_fish_counts")DA1=Dataset("step1/da1",tool="process_filter")DB0=Dataset("step2/db0",tool="join_counts")DB1=Dataset("step2/db1",tool="join_by_date")DC0=Dataset("results/dc0",tool="merge_model_obs")DC1=Dataset("results/dc1",tool="compute_uncertainty")DC2=Dataset("results/dc2",tool="make_plots")# Declare dependency relationships so that depgraph and determine the order of# the buildDA0.dependson(R0,R1)DA1.dependson(R2)DB0.dependson(DA0,DA1)DB1.dependson(DA1,R3)DC0.dependson(DB0,DB1)DC1.dependson(DB1)DC2.dependson(DB1)# Option 1:# Define a function that builds individual dependencies. The *buildmanager*# decorator transforms it into a loop that builds all dependencies above a# target@buildmanagerdefbatchbuilder(dependency,reason):# [....]returnexitcodebatchbuilder(DC1)# Option 2:# Implement the build loop manuallyfromdepgraphimportbuildalldefbuild(dependency,reason):# This may have the same logic as `batchbuilder` above, but we# will call it directly rather than wrapping it in @buildmanager# [....]returnexitcodeforstageinbuildall(DC1):# A build stage is a list of dependencies whose own dependencies are met and# that are independent, i.e. they can be built in parallelfordep,reasoninstage:# Each target is a dataset with a 'name' attribute and whatever# additional keyword arguments where defined with it.# The 'reason' is a depgraph.Reason object that codifies why a# particular target is necessary (e.g. it's out of date, it's missing# and required by a subsequent target, etc.)print("Building {0} with {1} because {2}".format(dep.name,dep.tool,reason))# Call a function or start a subprocess that will result in the# target being built and saved to a filereturn_val=build(dep,reason)# Perform logging, clean-up, or error handling operations# [....]

更改

0.4

性能改进
buildall生成器函数，比反复调用Dataset.buildnext()

0.3

循环图检测
Graphviz导出

0.2

重写，删除DependencyGraph，并使Dataset成为初级班

0.1

第一个版本，从asputil包的depchain模块复制

欢迎加入QQ群-->： 979659372

data-depgraph 0.4.4

data-depgraph的Python项目详细描述

重要零件

示例

更改

0.4

0.3

0.2

0.1

推荐PyPI第三方库

rsh

botPythonPackages

netgraph

tissueloc

pygal

structure

masonite-validation

onstar

reprint

pytracing

gengen002

PYUSBCAN

pifthon

krauler

snetd-alpha

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

data-depgraph 0.4.4

data-depgraph的Python项目详细描述

重要零件

示例

更改

0.4

0.3

0.2

0.1

推荐PyPI第三方库

rsh

botPythonPackages

netgraph

tissueloc

pygal

structure

masonite-validation

onstar

reprint

pytracing

gengen002

PYUSBCAN

pifthon

krauler

snetd-alpha

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签