如何做关于芬德尔（'\w+'，财务报表（））用于unicode？python - 问答 - Python中文网

如何做关于芬德尔（'\w+'，财务报表（））用于unicode？python

2024-04-19 15:45:20 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

对于使用unicode字符的文本文件（例如中文/日语），是否有方法执行以下操作：

with io.open(infile, 'r', encoding='utf8') as fin:
  words = re.findall('\w+', fin.read())
  x = Counter(zip(words,words[1:]))
print x

我试过了，但是x返回：

[]

Tags：方法 io re as with unicode open utf8

2条回答

网友

1楼 · 编辑于 2024-04-19 15:45:20

正如@Ashiwini所建议的，这是有效的：

words = re.findall('\w+', trgfin.read(), flags=re.U)
x = Counter(zip(words, words[1:])

网友

2楼 · 编辑于 2024-04-19 15:45:20

正如Ashwini Chaudhary所评论的，您需要指定^{}或^{}标志，以使模式\w依赖于Unicode字符属性数据库。你知道吗

>>> re.findall('\w+', u'單語')
[]
>>> re.findall('\w+', u'單語', flags=re.UNICODE)
[u'\u55ae\u8a9e']

相关问题更多 >

编程相关推荐

热门问题

热门文章