如何进行不区分大小写的字符串比较？

785 投票

15 回答

1177769 浏览

数据工程师

提问于 2025-04-11 18:25

我想知道在Python中，怎么不区分大小写地比较字符串？

我希望能用简单且符合Python风格的代码，把普通字符串和一个存储字符串的地方进行比较。我还想能够用普通的Python字符串在一个字典中查找值。

编程风格字符串比较字典查找不区分大小写

15 个回答

在使用Python 2的时候，对每个字符串或者Unicode对象调用.lower()...

string1.lower() == string2.lower()

...大部分情况下是有效的，但确实在@tchrist描述的某些情况下就不行了。

假设我们有一个文件叫unicode.txt，里面有两个字符串Σίσυφος和ΣΊΣΥΦΟΣ。在Python 2中：

>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True

Σ这个字符有两种小写形式，分别是ς和σ，而.lower()在比较它们时就没法做到不区分大小写。

不过，从Python 3开始，这三种形式都会变成ς，调用.lower()对这两个字符串进行处理就能正确比较了：

>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True

所以如果你在意像希腊字母中的这三种sigma这样的特殊情况，建议使用Python 3。

(作为参考，上面的解释器输出显示的是Python 2.7.3和Python 3.3.0b1的结果。)

回答于 2025-04-11 由 Python大师

分享举报

721

比较字符串时不区分大小写看起来很简单，但其实并不是。这里我会使用Python 3，因为Python 2在这方面不够完善。

首先要注意的是，在Unicode中去掉大小写的转换并不简单。有些文本的情况是 text.lower() != text.upper().lower()，比如 "ß"：

>>> "ß".lower()
'ß'
>>> "ß".upper().lower()
'ss'

假设你想要不区分大小写地比较 "BUSSE" 和 "Buße"。其实你可能还想比较 "BUSSE" 和 "BUẞE"，这是更新后的大写形式。推荐的做法是使用 casefold：

str.casefold()

返回字符串的一个去大小写副本。去大小写的字符串可以用于不区分大小写的匹配。

去大小写处理类似于小写处理，但更为彻底，因为它旨在去除字符串中的所有大小写区别。[...]

不要仅仅使用 lower。如果 casefold 不可用，使用 .upper().lower() 也能帮忙（但效果有限）。

接下来你还需要考虑重音符号。如果你的字体渲染器很好，你可能会认为 "ê" == "ê"，但实际上并不是：

>>> "ê" == "ê"
False

这是因为后者的重音符号是一个组合字符。

>>> import unicodedata
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']

处理这个问题最简单的方法是使用 unicodedata.normalize。你可能想使用 NFKD 规范化，但可以查看文档了解更多。然后可以这样做：

>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True

最后，这里用函数来表达：

import unicodedata

def normalize_caseless(text):
    return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
    return normalize_caseless(left) == normalize_caseless(right)

回答于 2025-04-11 由 Python大师

分享举报

806

假设我们在讨论的是ASCII字符串：

string1 = 'Hello'
string2 = 'hello'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

从Python 3.3开始，casefold()这个方法是一个更好的选择：

string1 = 'Hello'
string2 = 'hello'

if string1.casefold() == string2.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

如果你想要一个更全面的解决方案，能够处理更复杂的unicode比较，可以看看其他的回答。

回答于 2025-04-11 由 Python大师

分享举报

如何进行不区分大小写的字符串比较？

15 个回答

撰写回答