在Python嵌套列表中替换字符串中的Unicode字符

2 投票

2 回答

1739 浏览

提问于 2025-04-17 21:08

我想把这个列表中的一些字符串替换掉或者去掉，以便插入到一个不允许这些字符串的数据库里。

info=[[u'\xa0Buffalo\u2019s League of legends ...', '2012-09-05'], [u' \xa0RCKIN 0 - 1 WITHACK.nq\xa0  ', u'\xa0Buffalo\u2019s League of legends ...', '2012-09-05']]

我用了这段代码：

info = [[x.replace(u'\xa0', u'') for x in l] for l in info]
info = [[y.replace('\u2019s', '') for y in o] for o in info]

第一行代码能正常工作，但第二行不行，有什么建议吗？

字符串替换嵌套列表数据库插入 Unicode字符

2 个回答

因为在第二种写法中，\u2019s 并不被当作 Unicode 字符串来处理。你只需要在替换这个元素之前，加一个 u，像这样：

print [[y.replace(u'\u2019s', '') for y in o] for o in info]]

输出结果

[[u'Buffalo League of legends ...', u'2012-09-05'],
 [u' RCKIN 0 - 1 WITHACK.nq  ',
  u'Buffalo League of legends ...',
  u'2012-09-05']]

其实你可以把多个替换操作连在一起，像这样：

[[x.replace(u'\xa0', '').replace(u'\u2019s', '') for x in l] for l in info]

回答于 2025-04-17 由 Python大师

分享举报

去掉第二行，然后执行：

info = [[x.encode('ascii', 'ignore')  for x in l] for l in info]

看看结果是否可以接受。这段代码会尝试把所有的unicode字符转换成ascii字符，并且会丢掉那些无法转换的字符。你只需要确保，如果丢失了某个重要的unicode字符，也不会造成问题。

>>> info=[[u'\xa0Buffalo\u2019s League of legends ...', '2012-09-05'], [u' \xa0RCKIN 0 - 1 WITHACK.nq\xa0  ', u'\xa0Buffalo\u2019s League of legends ...', '2012-09-05']]
>>> info = [[x.encode('ascii', 'ignore')  for x in l] for l in info]
>>> info
[['Buffalos League of legends ...', '2012-09-05'], [' RCKIN 0 - 1 WITHACK.nq  ', 'Buffalos League of legends ...', '2012-09-05']]

发生了什么：

你的Python程序里有一些数据是Unicode格式的（这很好）。

>>> u = u'\u2019'

为了确保不同系统之间能够顺利交流，最佳做法是把Unicode字符串写成utf-8格式。这些是你应该存储在数据库里的字节：

>>> u.encode('utf-8')
'\xe2\x80\x99'
>>> utf8 = u.encode('utf-8')
>>> print utf8
’

然后，当你把这些字节读回程序时，应该对它们进行解码：

>>> utf8.decode('utf8')
u'\u2019'
>>> print utf8.decode('utf8')
’

如果你的数据库不能处理utf-8格式，那我建议你考虑换一个数据库。

回答于 2025-04-17 由 Python大师

分享举报

在Python嵌套列表中替换字符串中的Unicode字符

2 个回答

发生了什么：

撰写回答