如何从网页下载图片？

import os import requests from bs4 import BeautifulSoup downloadDirectory = "downloaded" baseUrl = "http://pythonscraping.com" def getAbsoluteURL(baseUrl, source): if source.startswith("http://www."): url = "http://"+source[11:] elif source.startswith("http://"): url = source elif source.startswith("www."): url = source[4:] url = "http://"+source else: url = baseUrl+"/"+source if baseUrl not in url: return None return url def getDownloadPath(baseUrl, absoluteUrl, downloadDirectory): path = absoluteUrl.replace("www.", "") path = path.replace(baseUrl, "") path = downloadDirectory+path directory = os.path.dirname(path) if not os.path.exists(directory): os.makedirs(directory) return path html = requests.get("http://www.pythonscraping.com") bsObj = BeautifulSoup(html.content, 'html.parser') downloadList = bsObj.find_all(src=True) for download in downloadList: fileUrl = getAbsoluteURL(baseUrl,download["src"]) if fileUrl is not None: print(fileUrl) with open(fileUrl, getDownloadPath(baseUrl, fileUrl, downloadDirectory), 'wb') as out_file: out_file.write(fileUrl.content)

http://pythonscraping.com/misc/jquery.js?v=1.4.4 Traceback (most recent call last): File "C:\Python36\kodovi\downloaded.py", line 43, in <module> with open(fileUrl, getDownloadPath(baseUrl, fileUrl, downloadDirectory), 'wb ') as out_file: TypeError: an integer is required (got type str)

1条回答

网友

1楼 · 发布于 2024-04-17 19:30:46

似乎你的下载列表包含了一些不是图片的URL。您可以在HTML中查找任何<img>标记：

downloadList = bsObj.find_all('img')

然后使用此选项下载这些图像：

for download in downloadList:
    fileUrl = getAbsoluteURL(baseUrl,download["src"])
    r = requests.get(fileUrl, allow_redirects=True)
    filename = os.path.join(downloadDirectory, fileUrl.split('/')[-1])
    open(filename, 'wb').write(r.content)

编辑：我已经更新了filename = ...行，这样它就可以将同名文件写入到downloadDirectory字符串中的目录中。顺便说一下，Python变量的常规惯例是不使用camel case。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章