在本地承载的websi中爬行时出错

2024-04-25 17:32:47 发布

男 | 程序猿一只，喜欢编程写python代码。

我想爬过一个网站，它目前在本地托管。难道不能抓取本地托管的网站吗？我得到这个错误：

 File "C:/Users/hero/PycharmProjects/project/Crawler.py", line 22, in <module>
    imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html"+img_src).read())
urllib.error.HTTPError: HTTP Error 404: Not Found

爬虫程序的代码：

import urllib.request
from bs4 import BeautifulSoup


def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata


i = 1
soup = make_soup("http://192.168.1.1/Webpage.html")

unique_srcs = []
for img in soup.findAll('img'):
    if img.get('src') not in unique_srcs:
        unique_srcs.append(img.get('src'))
for img_src in unique_srcs:
    filename = str(i)
    i = i + 1
    imagefile = open(filename + '.png', 'wb')
    imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html"+img_src).read())
    imagefile.close()

Tags： in src http img 网站 request html urllib

1条回答

网友

1楼 · 发布于 2024-04-25 17:32:47

您忘记在url路径中添加斜杠/

只需将行更改为以下内容：

imagefile.write(urllib.request.urlopen("http://192.168.1.1/Webpage.html/"+img_src).read())

在本地承载的websi中爬行时出错

相关问题更多 >

编程相关推荐

热门问题

热门文章

在本地承载的websi中爬行时出错

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >