Python textdata包_程序模块 - PyPI

直接从文本或python源轻松获取干净的数据

textdata的Python项目详细描述

通常需要在程序源中声明数据。然而，Python需要程序行缩进，所以。因此，多行字符串通常有额外的空格和换行符不是你真正想要的。不少开发商“修好” 这是通过使用python listliterals实现的，但这很乏味、冗长，而且经常不太清晰。

textdata包使您很容易获得干净、空白的内容在程序中指定的数据，但不需要额外的语法就可以获取数据把东西弄乱了它允许生成python所需的布局代码的外观和工作正常，而不反映结果数据。

文本（字符串和列表）

>>> lines("""
...     There was an old woman who lived in a shoe.
...     She had so many children, she didn't know what to do;
...     She gave them some broth without any bread;
...     Then whipped them all soundly and put them to bed.
... """)['There was an old woman who lived in a shoe.',
 "She had so many children, she didn't know what to do;",
 'She gave them some broth without any bread;',
 'Then whipped them all soundly and put them to bed.']

注意，“额外的”换行符和前导空格处理和丢弃。或者你只想要一个弦好的：

>>> text("""
...     There was an old woman who lived in a shoe.
...     She had so many children, she didn't know what to do;
...     She gave them some broth without any bread;
...     Then whipped them all soundly and put them to bed.
... """)"There was an old woman who lived in a shoe.\nShe ...put them to bed."

这里text()在开始处对无意义的空白进行相同的剥离行的末尾，将数据作为一个干净、方便的字符串返回或者如果你不需要大多数行尾，请在同一个输入上尝试textline以获取单曲不间断线

单词和短语

其他时候，你需要的数据几乎是，但不完全是，一系列话。一个名称列表，一个颜色列表-主要是单字，但有时有一个嵌入的空格。textdata有你覆盖范围：

>>> words(' Billy Bobby "Mr. Smith" "Mrs. Jones"  ')['Billy', 'Bobby', 'Mr. Smith', 'Mrs. Jones']

嵌入的引号（单引号或双引号）可用于构造包含空格（包括制表符和换行符）的“单词”（或短语）。

words与其他textdata工具一样，允许您注释单独的行，否则会弄脏字符串文本：

exclude = words("""
    __pycache__ *.pyc *.pyo     # compilation artifacts
    .hg* .git*                  # repository artifacts
    .coverage                   # code tool artifacts
    .DS_Store                   # platform artifacts
""")

产量：

['__pycache__', '*.pyc', '*.pyo', '.hg*', '.git*',
 '.coverage', '.DS_Store']

段落

您可能需要收集“段落”而不是单词——连续的文本行用空行划定的线。例如，标记和RST文档格式，使用此约定。

>>> rhyme="""
    Hey diddle diddle,

    The cat and the fiddle,
    The cow jumped over the moon.
    The little dog laughed,
    To see such sport,

    And the dish ran away with the spoon.
"""
>>> paras(rhyme)[['Hey diddle diddle,'],
 ['The cat and the fiddle,',
  'The cow jumped over the moon.',
  'The little dog laughed,',
  'To see such sport,'],
 ['And the dish ran away with the spoon.']]

或者如果您想要段落，但每个段落都是一个字符串：

>>> paras(rhyme,join="\n")['Hey diddle diddle,',
 'The cat and the fiddle,\nThe cow jumped over the moon.\nThe little dog laughed,\nTo see such sport,',
 'And the dish ran away with the spoon.']

词典

或者你想要一个dict。attrs函数使抓取：

.. code-block:: pycon

>>> attrs("a=1 b=2 c='something more'")
{'a': 1, 'b': 2, 'c': 'something more'}

如果要直接从javascript、json、html、css或 XML，简单易懂不需要文本编辑

>>> # JavaScript>>> attrs("a: 1, b: 2, c: 'something more'"){'a': 1, 'b': 2, 'c': 'something more'}

>>> # JSON>>> attrs('"a": 1, "b": 2, "c": "something more"'){'a': 1, 'b': 2, 'c': 'something more'}

>>> # HTML or XML>>> attrs('a="1" b="2" c="something more"'){'a': '1', 'b': '2', 'c': 'something more'}

>>> # above returns strings, because values quoted, which denotes strings>>> # 'full' evaluation needed to transform strings into values>>> attrs('a="1" b="2" c="something more"',evaluate='full'){'a': 1, 'b': 2, 'c': 'something more'}

>>> # CSS>>> attrs("a: 1; b: 2; c: 'something more'"){'a': 1, 'b': 2, 'c': 'something more'}

表格

或者你有表格数据。

>>> tabledata="""
...     name  age  strengths
...     ----  ---  ---------------
...     Joe   12   woodworking
...     Jill  12   slingshot
...     Meg   13   snark, snapchat
... """>>> table(tabledata)[['name', 'age', 'strengths'],
 ['Joe', 12, 'woodworking'],
 ['Jill', 12, 'slingshot'],
 ['Meg', 13, 'snark, snapchat']]

>>> records(tabledata)[{'name': 'Joe', 'age': 12, 'strengths': 'woodworking'},
 {'name': 'Jill', 'age': 12, 'strengths': 'slingshot'},
 {'name': 'Meg', 'age': 13, 'strengths': 'snark, snapchat'}]

即使您的桌子上有很多多余的绒毛，也可以这样做：

>>> fancy="""
... +------+-----+-----------------+
... | name | age | strengths       |
... +------+-----+-----------------+
... | Joe  |  12 | woodworking     |
... | Jill |  12 | slingshot       |
... | Meg  |  13 | snark, snapchat |
... +------+-----+-----------------+
... """>>> asserttable(tabledata)==table(fancy)>>> assertrecords(tabledata)==records(fancy)

它可以处理以多种方式格式化的表，包括markdown、rst， ANSI/Unicode行绘图字符、纯文本列和边框….你会的可能认为表解析是一个不确定的命题，容易失败，但是 textdata有数十个测试，包括相当复杂的案例，显示这是一个可靠的，高概率的启发式方法。

总而言之

textdata是为了方便地从文本中获取所需的数据文件和程序源，并在一个功能强大，方便，经过考验的方法。今天就转一圈吧！

见the full documentation at Read the Docs。

欢迎加入QQ群-->： 979659372

textdata 2.4.1

textdata的Python项目详细描述

文本（字符串和列表）

单词和短语

段落

词典

表格

总而言之

推荐PyPI第三方库

paganini

qbatch

kervi-device-librar

wr-drf-oidc-auth

dirn

witness_thanks

PYUSBCAN

odoo11-addons-oca-product-attribute

libyaz0

propriecle

KL-supportV1.6

phitools

tgbot-snippet

blockstack-storage-drivers

ems-gcp-toolkit

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

textdata 2.4.1

textdata的Python项目详细描述

文本（字符串和列表）

单词和短语

段落

词典

表格

总而言之

推荐PyPI第三方库

paganini

qbatch

kervi-device-librar

wr-drf-oidc-auth

dirn

witness_thanks

PYUSBCAN

odoo11-addons-oca-product-attribute

libyaz0

propriecle

KL-supportV1.6

phitools

tgbot-snippet

blockstack-storage-drivers

ems-gcp-toolkit

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签