在python中标识字符是单词中的数字字符还是Unicode字符

for word in f.read().strip().split(): for word1, word2, word3 in zip(word,word[1:],word[2:]): if word1 == "ர" and word2 == "ூ " and word3.isdigit(): print word1 print word2 if word1.decode('utf-8') == unichr(0xbb0) and word2.decode('utf-8') == unichr(0xbc2): print word1 print word2

2条回答

网友

1楼 · 编辑于 2024-05-14 21:52:15

使用unicode属性：

\pL代表任何语言的字母
\pN代表任何语言中的数字。你知道吗

在你的情况下可能是：

(\pL+\.?)(\pN+)

网友

2楼 · 编辑于 2024-05-14 21:52:15

您可以使用(.*?)(\d+)(.*)正则表达式，它将保存3个组：数字前的所有内容、数字和数字后的所有内容：

>>> import re
>>> pattern = ur'(.*?)(\d+)(.*)'
>>> s = u"ரூ.100"
>>> match = re.match(pattern, s, re.UNICODE)
>>> print match.group(1)
ரூ.
>>> print match.group(2)
100

或者，您可以将匹配的组解压为变量，如下所示：

>>> s = u"100ஆம்"
>>> match = re.match(pattern, s, re.UNICODE)
>>> before, digits, after = match.groups()
>>> print before

>>> print digits
100
>>> print after
ஆம்

希望有帮助。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中标识字符是单词中的数字字符还是Unicode字符

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >