我正在尝试制作一个Python网络爬虫程序,但是由于某些原因,当我尝试爬网一个网站,比如Amazon时,我的程序只输出'None'
import requests
from bs4 import BeautifulSoup
def spider(max_pages):
page = 1
while page <= max_pages:
url = 'https://www.amazon.com/s/ref=sr_pg_2?rh=i%3Aaps%2Ck%3Apython&page=' + str(page) + '&keywords=python&ie=UTF8&qid=1482022018&spIA=B01M63XMN1,B00WFP9S2E'
source = requests.get(url)
plain_text = source.text
obj = BeautifulSoup(plain_text, "html5lib")
for link in obj.find_all('a'):
href = link.get(url)
print(href)
page += 1
spider(1)
没有用户代理:
使用用户代理:
^{pr2}$它工作得很好。在
How to prevent getting blacklisted while scraping您可以阅读此页以了解为什么应该使用UA。在
相关问题 更多 >
编程相关推荐