在我的Python Utilities Github repo中,有一个函数可以从字符串、映射和序列中删除非打印字符和无效的Unicode字节:
def filterCharacters(s):
"""
Strip non printable characters
@type s dict|list|tuple|bytes|string
@param s Object to remove non-printable characters from
@rtype dict|list|tuple|bytes|string
@return An object that corresponds with the original object, nonprintable characters removed.
"""
validCategories = (
'Lu', 'Ll', 'Lt', 'LC', 'Lm', 'Lo', 'L', 'Mn', 'Mc', 'Me', 'M', 'Nd', 'Nl', 'No', 'N', 'Pc',
'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Po', 'P', 'Sm', 'Sc', 'Sk', 'So', 'S', 'Zs', 'Zl', 'Zp', 'Z'
)
convertToBytes = False
if isinstance(s, dict):
new = {}
for k,v in s.items(): # This is the offending line
new[k] = filterCharacters(v)
return new
if isinstance(s, list):
new = []
for item in s:
new.append(filterCharacters(item))
return new
if isinstance(s, tuple):
new = []
for item in s:
new.append(filterCharacters(item))
return tuple(new)
if isinstance(s, bytes):
s = s.decode('utf-8')
convertToBytes = True
if isinstance(s, str):
s = ''.join(c for c in s if unicodedata.category(c) in validCategories)
if convertToBytes:
s = s.encode('utf-8')
return s
else:
return None
有时此函数会引发异常:
Traceback (most recent call last):
File "./util.py", line 56, in filterCharacters
for k,v in s.items():
RuntimeError: dictionary changed size during iteration
我看不出我要把作为论据寄来的词典改在哪里了。那么,为什么要抛出这个异常呢?你知道吗
谢谢!你知道吗
在python3中
dict.items()
返回dict_view
对象(而不是像python2中那样list
)。通过查看CPython代码,我注意到如下注释Objects/dictobject.c
因此,不仅dict删除和插入会导致显示此错误,还会导致任何分配!面向对象!你知道吗
调整大小的过程也很有趣。看看
但这都是内在的。你知道吗
解决方案
请尝试使用将
dict_view
转换为list
相关问题 更多 >
编程相关推荐