Python shipyard包_程序模块 - PyPI

以受电子邮件标题启发的格式处理数据

shipyard的Python项目详细描述

什么是造船厂？

造船厂是一个以电子邮件为灵感的格式处理数据的模块标题（RFC 2822）。

造船厂的目标是要有一个简单的，人类可读的和人类可写的替换csv，它对长数据和多行以及对于特殊字符不需要困难的转义规则。

它被称为shipyard，因为该单词包含py，而不是好像还没被带走。

文件格式

字符编码

可以使用以下命令指定类似于PEP 0263的字符编码：

# -*- coding: <encoding name> -*-

在第一行。#替换为实际的comment标记。

更准确地说，第一行必须与常规行匹配表达式：

^#.*coding[:=]\s*([-\w.]+)

再次将#替换为实际的comment标记。第一组然后将此表达式的解释为编码名称。

数据集

数据集包含零个或多个records分隔的通过一个或多个空行。

以comment mark开头的行（默认值：#）是忽略。注释可以在records中使用，也可以在records之间使用。

记录

一个record由一个或多个fields

字段

field是具有以下格式的行：

key: value

key是一个字符串

不包含冒号
不是从comment标记开始的
不是从continuation标记开始的

value是任意字符串。它可以跨越多条线路 continuation标记。

续

如果一行以continuation标记开头（默认值：“[one blank]）它被附加到前一行，并带有删除了延续标记。

用法

>>> import shipyard

首先打开文件：

>>> input = open('nobel.sy')

然后我们创建一个解析器对象：

>>> reader = shipyard.Parser(keep_linebreaks=False,
...                          keys=['id', 'discipline', 'year',
...                                'name', 'country', 'rationale'])

对于每个记录，给定的键都是用none初始化的。

现在我们可以遍历记录：

>>> for record in reader.parse(input):    # doctest:+ELLIPSIS
...     print record['country']
United States
Japan
United States
...

我们可能不需要迭代，而是希望得到一个dict列表：

>>> input.seek(0)
>>> lod = reader.get_list(input)
>>> print lod     # doctest:+ELLIPSIS
[{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]

有时我们需要一个dict的dict（使用'id'字段作为键）：

>>> input.seek(0)
>>> dod = reader.get_dict(input, key='id')
>>> print dod.keys()
[u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8']
>>> print dod[u'5'][u'rationale']
for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics

如果不需要dict，可以使用“factory”参数：

>>> input.seek(0)
>>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values()))
>>> print los[0]
Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0

当然，一个类也可以作为工厂：

>>> input.seek(0)
>>> class Laureate(object):
...     def __init__(self, id, discipline, year, name, country, rationale):
...         self.name = name
>>> doo = reader.get_dict(input, key='id', factory = Laureate)
>>> print doo[u'2']      # doctest:+ELLIPSIS
<Laureate object at ...>
>>> print doo[u'2'].name
Roger Y. Tsien

现在我们写一份造船厂文件。

首先我们创建一个stringio（任何其他类似于object的文件也会这样）：

>>> import StringIO
>>> output = StringIO.StringIO()

接下来我们需要一个writer对象：

>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')

现在我们可以使用write（）来写入单个记录：

>>> writer.write(output, {'foo': 1, 'bar': 2})
>>> print output.getvalue()
foo: 1
bar: 2
<BLANKLINE>
<BLANKLINE>

使用write_many（）我们可以编写记录列表：

>>> output = StringIO.StringIO()
>>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)]
>>> writer.write_many(output, d)
>>> print output.getvalue()
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>

要获得编码行，我们使用write_coding（）：

>>> output = StringIO.StringIO()
>>> writer.write_coding(output)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
<BLANKLINE>

现在让我们使用write_full（）立即执行所有操作：

>>> output = StringIO.StringIO()
>>> writer.write_full(output, d)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>

欢迎加入QQ群-->： 979659372

shipyard 0.02

shipyard的Python项目详细描述

什么是造船厂？

文件格式

字符编码

数据集

评论

记录

字段

续

用法

推荐PyPI第三方库

jdoe_yt_downloader

numipulator

pyQtApp

dh-poetr

bh100

django-lb-adminlte

Hshare

cuzcatlan

zpgdb

hand_env

pacifica-dispatcher

tapioca-bookingsync

gohints

sciunit

slog

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

shipyard 0.02

shipyard的Python项目详细描述

什么是造船厂？

文件格式

字符编码

数据集

评论

记录

字段

续

用法

推荐PyPI第三方库

jdoe_yt_downloader

numipulator

pyQtApp

dh-poetr

bh100

django-lb-adminlte

Hshare

cuzcatlan

zpgdb

hand_env

pacifica-dispatcher

tapioca-bookingsync

gohints

sciunit

slog

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签