在Python中比较两个网址

22 投票

3 回答

15448 浏览

提问于 2025-04-16 14:05

在Python中，有没有一种标准的方法可以比较两个网址，像这个例子中的are_url_the_same函数那样：

url_1 = 'http://www.foo.com/bar?a=b&c=d'
url_2 = 'http://www.foo.com:80/bar?c=d;a=b'

if are_urls_the_same(url_1, url2):
    print "URLs are the same"

这里的“相同”是指它们访问的是同一个资源——所以例子中的这两个网址是相同的。

网络请求网址比较资源标识 URL规范化

3 个回答

如果你不介意使用第三方库，可以试试yarl。

我发现了一个情况，在这个情况下，最好的答案并不奏效：

>>> import yarl
>>> yarl.URL("http://example.com/path/../to") == yarl.URL("http://example.com/to")
True

回答于 2025-04-16 由 Python大师

分享举报

使用 urlparse，然后写一个比较函数，选择你需要的字段来进行比较。

>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')

你可以根据以下任意字段进行比较：

scheme 0 URL的协议部分
netloc 1 网络位置部分
path 2 层级路径
params 3 最后路径元素的参数
query 4 查询部分
fragment 5 片段标识符
username 用户名
password 密码
hostname 主机名（小写）
port 端口号（如果有的话，作为整数）

回答于 2025-04-16 由 Python大师

分享举报

这里有一个简单的类，可以让你做到这一点：

if Url(url1) == Url(url2):
    pass

不过，这个类也可以很容易地改成一个函数。因为这些对象是可以被哈希的，所以你可以把它们放进缓存里，使用集合或字典来存储：

# Python 2
# from urlparse import urlparse, parse_qsl
# from urllib import unquote_plus
# Python 3
from urllib.parse import urlparse, parse_qsl, unquote_plus
    
class Url(object):
    '''A url object that can be compared with other url orbjects
    without regard to the vagaries of encoding, escaping, and ordering
    of parameters in query strings.'''

    def __init__(self, url):
        parts = urlparse(url)
        _query = frozenset(parse_qsl(parts.query))
        _path = unquote_plus(parts.path)
        parts = parts._replace(query=_query, path=_path)
        self.parts = parts

    def __eq__(self, other):
        return self.parts == other.parts

    def __hash__(self):
        return hash(self.parts)

回答于 2025-04-16 由 Python大师

分享举报

在Python中比较两个网址

3 个回答

撰写回答