如何控制PyYAML对我的数据使用哪种标量形式?

64 投票
7 回答
31313 浏览
提问于 2025-04-17 09:03

我有一个对象,它里面有一个短字符串属性和一个长的多行字符串属性。我想把短字符串写成YAML格式的引用标量,把多行字符串写成字面量标量:

my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"

我希望YAML的格式看起来像这样:

short: "Hello"
long: |
  Line1
  Line2
  Line3

我该怎么告诉PyYAML这样做呢?如果我调用 yaml.dump(my_obj),它会输出一个类似字典的结果:

{long: 'line1

    line2

    line3

    ', short: Hello}

(我不太明白为什么长字符串会有双倍的行间距……)

我能告诉PyYAML怎么处理我的属性吗?我想控制它们的顺序和样式。

7 个回答

15

我想让任何包含 \n 的输入都被当作块文字处理。于是我以 yaml/representer.py 里的代码为基础,得到了以下内容:

# -*- coding: utf-8 -*-
import yaml

def should_use_block(value):
    for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
        if c in value:
            return True
    return False

def my_represent_scalar(self, tag, value, style=None):
    if style is None:
        if should_use_block(value):
             style='|'
        else:
            style = self.default_style

    node = yaml.representer.ScalarNode(tag, value, style=style)
    if self.alias_key is not None:
        self.represented_objects[self.alias_key] = node
    return node


a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}

print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))

输出结果

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: 'Lêne1

    Lêne2

    Lêne3

    ', short: Hello}

After override

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: |
  Lêne1
  Lêne2
  Lêne3
short: Hello
79

我非常喜欢@lbt的做法,于是写了这段代码:

import yaml

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)

# to use with safe_dump:
yaml.representer.SafeRepresenter.add_representer(str, str_presenter)

这段代码让每个多行字符串都变成了一个块状文字。

我试着避免使用猴子补丁的部分。非常感谢@lbt和@J.F.Sebastian的贡献。

38

根据在Python中有没有支持将长字符串以块文字或折叠块形式输出的yaml库?

import yaml
from collections import OrderedDict

class quoted(str):
    pass

def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str):
    pass

def literal_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
    return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))

print(yaml.dump(d))

输出

short: "Hello"
long: |
  Line1
  Line2
  Line3

撰写回答