Python cubicweb-dataio包_程序模块 - PyPI

用于数据输入/输出、导入和导出的多维数据集

cubicweb-dataio的Python项目详细描述

大量存储

这家大型商店是一家cw商店，用来推销大量商品。使用纯sql逻辑的数据，从而避免了cw检查。它比其他cw存储更快（它不会在每个步骤检查eid，它使用copy-from方法），但不太安全（没有数据完整性安全性）。使用create_entity函数时不返回EID。

警告：此存储目前只能与PostgreSQL一起使用，因为依赖于copy from方法和特定的postgresql表获取所有索引。

大卖场工作流程

庞大的门店工作流程如下：

< Buff行情>

从元数据表（实体，是…的实例）中删除索引和约束；
数据插入：
< Buff行情>
- 使用为实体创建实体函数；
- 对关系使用 relate 函数；
- 对基于外部标识符的关系使用 related\u by\u iid 函数；
- 每次插入尚未看到的rtype都将触发为此rtype创建一个临时表，以存储结果。
- 每次插入一个尚未看到的etype都将删除所有实体表上的索引/约束。
在给定的点上，应该调用 flush 方法：
< Buff行情>
- 它将根据从
- 它将根据从
- 它将基于从中复制的关系iid数据刷新到数据库中
- 它将为插入的实体创建元数据（实体，…）。
- 它将承诺。
如果某些关系是基于外部标识符创建的（ relate\u by\u iid ），应使用 convert_relations 方法手动完成转换。
在插入结束时，应该调用 cleanup 方法：
< Buff行情>
- 它将为实体/关系表重新创建索引/约束/主键。
- 它将在元数据表上重新创建索引/约束。
- 它将删除临时表和内部存储表。

海量存储中的实体/关系

由于数据库插入的技术限制，需要注意以下几点：

< Buff行情>

调用create_entity将返回具有特定id的实体。EID自动处理通过大规模存储（它将获取给定范围的eid以供内部使用），但是您可以在创建实体的kwargs中传递特定的eid，以绕过eid的自动分配。
relate 方法不支持内联关系。

将为调用postgresql copy_from 子句创建缓冲区。如果在实体（或关系）的数据中找到用于创建此表格文件的分隔符，它将被商店的替换sep 替换（默认为""）。

海量存储的基本用途

使用海量存储的简单脚本：

# Initialize the store
store = MassiveObjectStore(session)
# Initialize the Relation table
store.init_rtype_table('Person', 'lives', 'Location')

# Import logic
...
entity = store.create_entity('Person', ...)
entity = store.create_entity('Location', ...)

# Flush the data in memory to sql database
store.flush()

# Import logic
...
entity = store.create_entity('Person', ...)
entity = store.create_entity('Location', ...)
# Person_iid and location_iid are unique iid that are data dependant (e.g URI)
store.relate_by_iid(person_iid, 'lives', location_iid)
...

# Flush the data in memory to sql database
store.flush()

# Convert the relation
store.convert_relations('Person', 'lives', 'Location')

# Clean the store / rebuild indexes
store.cleanup()

在这种情况下，iid subj和iid obj代表一个唯一的id 可用于创建导入实体后的关系。

大规模存储的高级使用

大规模存储的简单默认使用是保守的避免元数据MA中的问题管理。但是有可能提高插入速度：

如果刷新元数据的次数过多，则可能代价高昂。一个好的做法是在进口结束时只做一次。为此，应在创建存储时将 autoflush_metadata 设置为false，您应该在导入结束时调用刷新元数据（ 但在调用"cleanup"之前） 。
您可以通过设置 commit\u at\u flush来避免每次刷新时提交在商店创建中为false。所以你应该明确地在刷新元数据之前， commit 方法至少一次打扫商店
您可以使用商店创建期间的 drop_index 属性。
您可以使用在商店创建期间， eids\u seq\u start 属性。
可以提供额外的回调来处理提交和回滚（关于提交回调和关于回滚回调）。

大规模存储的高级使用示例：

store = MassiveObjectStore(session,
                           autoflush_metadata=False,
                           commit_at_flush=False)
store.init_rtype_table('Location', 'names', 'LocationName')
for ind, infos in enumerate(ucsvreader(open(dumpname))):
    entity = {'name': infos[1], ...}
    entity['my_inlined_relation'] =  my_dict.get(infos[2])
    entity = store.create_entity('Location', **entity)
    store.relate_by_iid(entity.cwuri, 'my_external_relation', infos[3])
    if ind and ind % 200000 == 0:
        store.flush()
        store.commit()
store.flush()
store.commit()
store.flush_meta_data()
store.convert_relations('Location', 'my_external_relation', 'Location',
                        'cwuri', 'cwuri')
store.cleanup()

在大量存储失败后还原数据库

大规模存储删除了一些约束和索引在清除调用期间自动重建。如果有错误在导入过程中，您仍然可以调用方法，甚至在失败后重新创建另一个存储并调用清理此存储的方法。

大型商店创建以下表格供内部使用：

dataio_initialized ：有关初始化的etype/rtype表的信息。
dataio_constraints ：可用于还原约束/索引的查询对于不同的etype/rtype表。
dataio_meta data ：已经推送元数据的etype。

从模式

从模式可用于并行使用海量存储：

应该创建一个大型存储（ master ）。
对于在导入，初始化表 / 初始化关系表方法 应呼叫master 商店。
可以使用从模式创建不同的从存储属性。自动刷新元数据属性应设置为false。
每个从存储都可以在不同的线程中使用，用于创建实体和关系，且只应调用其 flush 和提交方法。
master 存储应调用其刷新元数据和清除方法在导入结束时。

欢迎加入QQ群-->： 979659372

cubicweb-dataio 0.7.0

cubicweb-dataio的Python项目详细描述

大量存储

大卖场工作流程

海量存储中的实体/关系

海量存储的基本用途

大规模存储的高级使用

在大量存储失败后还原数据库

从模式

推荐PyPI第三方库

ZerZ-test

pymsgq

Tau-Phah-Ji-Command

VisiLibit

mpfshell

bespin-cli

twitter.common.rpc

pypi-check01

doufo

difflame

jsondataunit

typedtensor

gm-backoffice-client

pypdb

hangpanels

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

cubicweb-dataio 0.7.0

cubicweb-dataio的Python项目详细描述

大量存储

大卖场工作流程

海量存储中的实体/关系

海量存储的基本用途

大规模存储的高级使用

在大量存储失败后还原数据库

从模式

推荐PyPI第三方库

ZerZ-test

pymsgq

Tau-Phah-Ji-Command

VisiLibit

mpfshell

bespin-cli

twitter.common.rpc

pypi-check01

doufo

difflame

jsondataunit

typedtensor

gm-backoffice-client

pypdb

hangpanels

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签