When the LOCALE and UNICODE flags are
not specified, matches any
alphanumeric character and the
underscore; this is equivalent to the
set [a-zA-Z0-9_]. With LOCALE, it will
match the set [0-9_] plus whatever
characters are defined as alphanumeric
for the current locale. If UNICODE is
set, this will match the characters
[0-9_] plus whatever is classified as
alphanumeric in the Unicode character
properties database.
也许unicodedata module对这个任务很有用。尤其是
category()
函数。对于现有的unicode类别,请查看unicode.org。然后你可以过滤标点字符等根据您如何定义“name”,您可以根据以下正则表达式检查它:
但是,这将允许数字和下划线。要排除它们,您可以针对以下各项进行第二次测试:
^{pr2}$让你的支票在比赛中失败。这两种方法可以结合如下:
但是出于regex性能的原因,我宁愿做两个单独的检查。在
来自the docs:
只需将bytestring(您的utf-8)转换为unicode对象,然后检查是否所有字符都是字母:
此方法依赖于bytestrings的区域设置。在
相关问题 更多 >
编程相关推荐