Python中有支持将长字符串导出为块字面量或折叠块的yaml库吗?

32 投票
3 回答
17281 浏览
提问于 2025-04-16 20:02

我想把一个包含长字符串的字典输出成块状格式,这样看起来更清晰易读。比如:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML支持以这种格式加载文档,但我找不到办法把文档以这种方式输出。难道我漏掉了什么吗?

3 个回答

3

这件事相对简单,唯一的“障碍”就是如何标明字符串中哪些空格需要被表示为折叠标量。字面标量有明确的换行符来包含这些信息,但折叠标量就不行了,因为它们可能会包含显式的换行符,比如在有前导空格的情况下,最后还需要一个换行符,以避免用去掉空白的指示符(>-)来表示。

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

这段代码的结果是:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

fold_pos属性需要一个可逆的可迭代对象,表示空格的位置,指明在哪里进行折叠。

如果你的字符串中从来没有管道符号('|'),你可以这样做:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

这也会给出你期待的输出。

27

pyyaml 确实支持输出字面量或折叠块。

使用 Representer.add_representer

定义类型:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

然后你可以为这些类型定义表示器。请注意,虽然 Gary解决方案 对于unicode很好用,但你可能需要做更多工作才能让字符串正常工作(见 represent_str的实现)。

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

然后你可以把这些表示器添加到默认的输出器中:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... 然后进行测试:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

结果:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

使用 default_style

如果你希望所有字符串都遵循一个默认样式,你也可以使用 default_style 这个关键字参数,例如:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

或者使用折叠字面量:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

或者使用双引号字面量:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

注意事项:

这里有一个你可能没有预料到的例子:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

结果是:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) 不可打印字符

请查看YAML规范中关于转义字符的内容(第5.7节):

注意,转义序列仅在双引号标量中被解释。在所有其他标量样式中,“\”字符没有特殊含义,不可打印字符不可用。

如果你想保留不可打印字符(例如TAB),你需要使用双引号标量。如果你能够以字面样式输出一个标量,并且其中有不可打印字符(例如TAB),那么你的YAML输出器是不合规的。

例如,pyyaml 检测到不可打印字符 \t,并使用双引号样式,即使指定了默认样式:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) 前后空格

规范中还有另一条有用的信息:

所有前导和尾随的空白字符都不包含在内容中。

这意味着如果你的字符串有前导或尾随空格,这些空格在除双引号外的标量样式中不会被保留。因此,pyyaml 会尝试检测你的标量内容,并可能强制使用双引号样式。

40
import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

结果是:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

为了完整起见,应该也有字符串的实现,不过我懒得做 :-)

撰写回答