使用urlspli仅从url获取域名

httpx after_urlsplit 0 https://stackoverflow.com/ (https, stackoverflow.com, /, , ) 1 https://www.stackoverflow.com/ (https, www.stackoverflow.com, /, , ) 2 www.stackoverflow.com/ (, , www.stackoverflow.com/, , ) 3 stackoverflow.com/ (, , stackoverflow.com/, , )

3条回答

网友

1楼 · 编辑于 2024-04-23 23:37:18

新的答案，为网址和主机名也工作

要处理没有协议定义的实例（例如example.com），最好使用regex：

import re

urls = ['www.stackoverflow.com',
        'stackoverflow.com',
        'https://stackoverflow.com',
        'https://www.stackoverflow.com/',
        'www.stackoverflow.com',
        'stackoverflow.com',
        'https://subdomain.stackoverflow.com/']

for url in urls:
    host_name = re.search("^(?:.*://)?(.*)$", url).group(1).split('.')[-2]
    print(host_name)

在所有情况下都打印stackoverflow。你知道吗

旧答案，仅适用于URL

您可以使用urlspit返回的netloc值，另外还可以进行一些额外的裁剪以获得所需的域（部分）：

from urllib.parse import urlsplit

m = urlsplit('http://subdomain.example.com/some/extra/things')

print(m.netloc.split('.')[-2])

它打印example。你知道吗

（但是，这在http://localhost/some/path/to/file.txt这样的URL上会失败）

网友

2楼 · 编辑于 2024-04-23 23:37:18

您可以使用正则表达式（regex）执行此任务。你知道吗

import re

URL = "https://www.test.com"
result = re.search("https?:\/\/(www.)?([\w\.\_]+)", URL)
print(result.group(2))

# output: test.com

网友

3楼 · 编辑于 2024-04-23 23:37:18

处理此类问题的最佳方法是使用regex。你知道吗

新的答案，为网址和主机名也工作

旧答案，仅适用于URL

相关问题更多 >

编程相关推荐

热门问题

热门文章