从提取链接中获取ValueError

from bs4 import BeautifulSoup import urllib.request, urllib.parse, urllib.error import ssl import re ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input("Enter First Link: ") if len(url)<1: url = "https://www.bing.com/search?q=k+means+wiki&src=IE-SearchBox&FORM=IENAD2" position = 18 process = 7 #to repeat 18 times# for i in range(process): html = urllib.request.urlopen(url, context=ctx) soup = BeautifulSoup(html, 'html.parser') tags = soup('a') count = 0 for tag in tags: count = count +1 #make it stop at position 3# if count>position: break url = tag.get('href', None) print(url)

3条回答

网友

1楼 · 编辑于 2024-04-25 13:06:15

出现错误的原因是该链接无效。您可以尝试将“https://bing.com”前置到URL的开头，或者捕获错误

要捕获错误：

try:
    url = tag.get('href', None)
except ValueError:
    print("Invalid URL")

要预先设置URL：

url = 'https://bing.com' + url

网友

2楼 · 编辑于 2024-04-25 13:06:15

https://docs.python.org/3/tutorial/errors.html#errors-and-exceptions

有关错误和异常，请参阅python文档

你可以把它放在循环中：

for i in range(process):
    try:
        "line of code causes the problem"
    except ValueError:
        print("invalid url")

希望有帮助

网友

3楼 · 编辑于 2024-04-25 13:06:15

它遇到的url没有架构或域。它是一个相对的url，意味着需要将它附加到当前页面url中才能转到它。URL通常以schema://domain.domain like 在https://www.facebook.com中。如果您检查您的网址，以确保它们包含架构和域，然后附加它们，如果他们丢失，那么您将避免这个错误

举个例子：

/search?q=stack+overflow

可能是在google上搜索堆栈溢出的相对url

要重建完整的url，只需在开头添加https://www.google.com，它就变成了一个实际的搜索链接https://www.google.com/search?q=stack+overflow

相关问题更多 >

编程相关推荐

热门问题

热门文章