如何验证url是否存在而不是重定向?

2024-04-19 10:39:00 发布

您现在位置:Python中文网/ 问答频道 /正文

如何验证页面url是否存在并且不重定向到not found url page
示例:

import socket
try:
    socket.gethostbyname('www.google.com/imghp')
except socket.gaierror as ex:
    print "Not existe"

它总是不存在


Tags: importcomurl示例wwwgooglepagenot
2条回答

有了gethostbyname()你不会得到你想要的结果。考虑使用urllib2。在您的情况下,以下内容可以满足您的需要:

import urllib2

#The data variable can be used to send POST data
data=None
#Here add as many header fields as you wish
headers={"User-agent":"Blahblah", "Cookie":"yourcookievalues"}
url = "http://www.google.com/imghp"
request = urllib2.Request(url, data, headers)
try:
    response = urllib2.urlopen(request)
    #Check redirection here
    if (response.geturl() != url):
         print "The page at: "+url+" redirected to: "+response.geturl()
except urllib2.HTTPError as err:
    #Catch 404s etc.
    print "Failed with code: "+str(err)

希望这对你有帮助!在

你用错了工具来完成任务!

screw hammer

manual

socket.gethostbyname(hostname)

Translate a host name to IPv4 address format. The IPv4 address is returned as a string, such as '100.50.200.5'. If the host name is an IPv4 address itself it is returned unchanged. See gethostbyname_ex() for a more complete interface. gethostbyname() does not support IPv6 name resolution, and getaddrinfo() should be used instead for IPv4/v6 dual stack support.

该工具用于检查域是否存在,并获取其IP地址:

>>> try:
...     print(socket.gethostbyname('www.google.com'))
... except socket.gaierror as ex:
...     print("Does not exists")
... 
216.58.211.132

您可能需要实际连接到站点并检查是否有页面:

^{pr2}$

来自^{}方法只从Web服务器获取有关页面的信息,而不是页面本身,因此它在网络使用方面非常轻量级。在

spoiler alert:如果您试图使用^{}获取页面内容,您将什么也得不到,为此您需要使用^{}方法。在


更新1

您要检查的站点已损坏,即它不符合internet标准。它不是给出一个404,而是给出一个302来重定向到状态代码为200的“页面不存在”页面:

>>> response = requests.head('http://qamarsoft.com/does_not_exists', allow_redirects=True)
>>> response.status_code
200

要解决这个问题,您需要获取该站点的页面,并检查重定向的URI在重定向URL中是否有404

>>> response = requests.head('http://qamarsoft.com/does_not_exists'
>>> response.headers['location']
'http://qamarsoft.com/404'

所以测试将变成:

>>> response = requests.head('http://qamarsoft.com/does_not_exists')
>>> if '404' in response.headers['location']:
...     print('Does not exists')
... else:
...     print('Exists')
Exists

更新2

对于第二个URL,您可以在python控制台中自己尝试:

>>> import requests
>>> response = requests.head('http://www.***********.ma/does_not_Exists')
>>> if response.status_code == 404:
...    print("Does not exists")
... else:
...    print("Exists")
...
Does not exists
>>> response = requests.head('http://www.***********.ma/annonceur/a/3550/n.php ')
>>> if response.status_code == 404:
...    print("Does not exists")
... else:
...    print("Exists")
...
Exists

没有好处

您可能需要安装requests软件包:

pip install requests

或者如果你是现代人,用Python3:

pip3 install requests

相关问题 更多 >