几天前,我创建了this post,以寻求任何解决方案,让我的脚本以这样的方式循环,这样脚本将使用很少的链接来检查我定义的title
(应该从每个链接中提取)是否在four
次内一文不值。如果title
仍然是空的,那么脚本将break
替换loop
,并转到另一个链接以重复相同的操作。你知道吗
这就是我获得成功的方式——►通过将fetch_data(link)
改为return fetch_data(link)
,并在while loop
之外而在if
语句内部定义counter=0
。你知道吗
正稿:
import time
import requests
from bs4 import BeautifulSoup
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
counter = 0
def fetch_data(link):
global counter
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
try:
title = soup.select_one("p.tcode").text
except AttributeError: title = ""
if not title:
while counter<=3:
time.sleep(1)
print("trying {} times".format(counter))
counter += 1
return fetch_data(link) #First fix
counter=0 #Second fix
print("tried with this link:",link)
if __name__ == '__main__':
for link in links:
fetch_data(link)
这是上述脚本生成的输出(根据需要):
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4
I used wrong selector within my script so that I can let it meet the condition I've defined above.
Why should I use
return fetch_data(link)
instead offetch_data(link)
as the expressions work identically most of the times?
如果函数中的while循环无法获取标题,它将启动一个递归调用。当您使用
return fetch_data(link)
时,它可以工作,因为每当计数器小于或等于3while counter<=3
时,它将在while循环结束时立即退出函数,因此不会转到将计数器重置为0counter=0
的下行。因为计数器是一个全局变量,每个递归深度只增加1,所以最大递归深度只有4个,因为只要counter
大于3,它就不会进入调用另一个fetch_data(link)
的while循环。你知道吗如果使用
fetch_data(link)
,函数仍将在while循环中启动递归调用。但是,不会立即退出,并将计数器重置为0。这是危险的,因为在计数器转到4之后,函数返回while循环中上一个函数调用的while循环,while循环将不会中断并继续启动其他递归调用,因为计数器当前设置为0,即<;=3。这将最终达到最大递归深度,并将使程序崩溃。你知道吗相关问题 更多 >
编程相关推荐