Python正则表达式中的匹配Unicode字符

>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg') >>> print m.groupdict() Traceback (most recent call last): File "<interactive input>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'groupdict'

3条回答

网友

1楼 · 编辑于 2024-05-29 08:08:50

您需要UNICODE标志：

m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg', re.UNICODE)

网友

2楼 · 编辑于 2024-05-29 08:08:50

您需要指定re.UNICODE标志，和使用u前缀将字符串输入为Unicode字符串：

>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict()
{'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'}

这是在Python 2中；在Python 3中，必须省略u，因为所有字符串都是Unicode。

网友

3楼 · 编辑于 2024-05-29 08:08:50

在Python 2中，需要re.UNICODE标志和unicode字符串构造函数

>>> re.sub(r"[\w]+","___",unicode(",./hello-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./cześć-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./привет-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好，世界-=+","utf-8"),flags=re.UNICODE)
u',./___\uff0c___-=+'
>>> print re.sub(r"[\w]+","___",unicode(",./你好，世界-=+","utf-8"),flags=re.UNICODE)
,./___，___-=+

（在后一种情况下，逗号是中文逗号。）

相关问题更多 >

编程相关推荐

热门问题

热门文章