我正在学习Python,学习的方法是将无聊的事情自动化。这个程序应该转到http://xkcd.com/并下载所有图像以供脱机查看。
我在2.7版和Mac上。
出于某种原因,我得到了一些错误,比如“没有提供模式”和使用request.get()本身时的错误。
这是我的代码:
# Saves the XKCD comic page for offline read
import requests, os, bs4, shutil
url = 'http://xkcd.com/'
if os.path.isdir('xkcd') == True: # If xkcd folder already exists
shutil.rmtree('xkcd') # delete it
else: # otherwise
os.makedirs('xkcd') # Creates xkcd foulder.
while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
# Download the page
print 'Downloading %s page...' % url
res = requests.get(url) # Get the page
res.raise_for_status() # Check for errors
soup = bs4.BeautifulSoup(res.text) # Dowload the page
# Find the URL of the comic image
comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
if comicElem == []: # if the list is empty
print 'Couldn\'t find the image!'
else:
comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
# comicUrl
# Download the image
print 'Downloading the %s image...' % (comicUrl)
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
res.raise_for_status() # Check for errors
# Save image to ./xkcd
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev btn's URL
prevLink = soup.select('a[rel="prev"]')[0]
# The Previous button is first <a rel="prev" href="/1535/" accesskey="p">< Prev</a>
url = 'http://xkcd.com/' + prevLink.get('href')
# adds /1535/ to http://xkcd.com/
print 'Done!'
以下是错误:
Traceback (most recent call last):
File "/Users/XKCD.py", line 30, in <module>
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
prep = self.prepare_request(req)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
self.prepare_url(url, params)
File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?
问题是我已经多次阅读了书中关于程序的部分,阅读了请求文档,以及这里的其他问题。我的语法看起来不错。
谢谢你的帮助!
编辑:
这不起作用:
comicUrl = ("http:"+comicElem[0].get('src'))
我认为添加http:before可以消除无模式提供的错误。
将
comicUrl
更改为没有模式意味着您没有提供
http://
或https://
提供这些,它将完成这项任务。编辑:看看这个URL字符串!以下内容:
URL '//imgs.xkcd.com/comics/the_martian.png':
说明:
一些XKCD页面有特殊的内容,而不是简单的图像文件。没关系,你可以跳过这些。如果选择器找不到任何元素,则soup.select('35; comic img')将返回一个空列表。
工作代码:
相关问题 更多 >
编程相关推荐