我想爬过一个网站,它目前在本地托管。难道不能抓取本地托管的网站吗?我得到这个错误:
File "C:/Users/hero/PycharmProjects/project/Crawler.py", line 22, in <module>
imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html"+img_src).read())
urllib.error.HTTPError: HTTP Error 404: Not Found
爬虫程序的代码:
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
i = 1
soup = make_soup("http://192.168.1.1/Webpage.html")
unique_srcs = []
for img in soup.findAll('img'):
if img.get('src') not in unique_srcs:
unique_srcs.append(img.get('src'))
for img_src in unique_srcs:
filename = str(i)
i = i + 1
imagefile = open(filename + '.png', 'wb')
imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html"+img_src).read())
imagefile.close()
您忘记在url路径中添加斜杠
/
只需将行更改为以下内容:
imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html/"+img_src).read())
相关问题 更多 >
编程相关推荐