使用eventlet抓取和保存文件时的问题
我可以使用evenlet从网站上抓取图片,但却无法把它们保存到本地目录。以下是我的代码。有没有人对任务模型中的输入输出操作比较熟悉?谢谢!
import pyquery
import eventlet
from eventlet.green import urllib2
#fetch img urls............ works fine
print "loading page..."
html=urllib2.urlopen("http://www.meinv86.com/meinv/yuanchuangmeinvzipai/").read()
print "Parsing urls..."
d=pyquery.PyQuery(html)
count=0
urls=[]
url=''
for i in d('img'):
count=count+1
print i.attrib["src"]
urls.append(i.attrib["src"])
def fetch(url):
try:
print "start feteching %s" %(url)
urlfile = urllib2.urlopen(url)
size=int(urlfile.headers['content-length'])
print 'downloading %s, total file size: %d' %(url,size)
data = urlfile.read()
print 'download complete - %s' %(url)
##########################################
#file save just won't work
f=open("/head2/"+url+".jpg","wb")
f.write(body)
f.close()
print "file saved"
##########################################
return data
except:
print "fail to download..."
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "done"
1 个回答
0
确保 url
适合用作文件名,比如:
import hashlib
import os
def url2filename(url, ext=''):
return hashlib.md5(url).hexdigest() + ext # anything that removes '\/'
# ...
with open(os.path.join("/head2", url2filename(url, '.jpg')), 'wb') as f:
f.write(body)
print "file saved"
注意:你可能不想把文件写到像 '/head2'
这样的顶级目录里。
你也可以考虑使用 urllib.urlretrieve()
。