如何控制PyYAML对我的数据使用哪种标量形式?
我有一个对象,它里面有一个短字符串属性和一个长的多行字符串属性。我想把短字符串写成YAML格式的引用标量,把多行字符串写成字面量标量:
my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"
我希望YAML的格式看起来像这样:
short: "Hello"
long: |
Line1
Line2
Line3
我该怎么告诉PyYAML这样做呢?如果我调用 yaml.dump(my_obj)
,它会输出一个类似字典的结果:
{long: 'line1
line2
line3
', short: Hello}
(我不太明白为什么长字符串会有双倍的行间距……)
我能告诉PyYAML怎么处理我的属性吗?我想控制它们的顺序和样式。
7 个回答
15
我想让任何包含 \n
的输入都被当作块文字处理。于是我以 yaml/representer.py
里的代码为基础,得到了以下内容:
# -*- coding: utf-8 -*-
import yaml
def should_use_block(value):
for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
if c in value:
return True
return False
def my_represent_scalar(self, tag, value, style=None):
if style is None:
if should_use_block(value):
style='|'
else:
style = self.default_style
node = yaml.representer.ScalarNode(tag, value, style=style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
return node
a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
输出结果
{multiline: 'Line1
Line2
Line3
', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}
{multiline: 'Line1
Line2
Line3
', multiline-unicode: 'Lêne1
Lêne2
Lêne3
', short: Hello}
After override
multiline: |
Line1
Line2
Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello
multiline: |
Line1
Line2
Line3
multiline-unicode: |
Lêne1
Lêne2
Lêne3
short: Hello
79
我非常喜欢@lbt的做法,于是写了这段代码:
import yaml
def str_presenter(dumper, data):
if len(data.splitlines()) > 1: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
# to use with safe_dump:
yaml.representer.SafeRepresenter.add_representer(str, str_presenter)
这段代码让每个多行字符串都变成了一个块状文字。
我试着避免使用猴子补丁的部分。非常感谢@lbt和@J.F.Sebastian的贡献。
38
根据在Python中有没有支持将长字符串以块文字或折叠块形式输出的yaml库?
import yaml
from collections import OrderedDict
class quoted(str):
pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str):
pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))
print(yaml.dump(d))
输出
short: "Hello"
long: |
Line1
Line2
Line3