python中的简单图像抓取
snatch的Python项目详细描述
可配置的、可扩展的python图像抓取。灵感来自Kenneth Reitz'Requests库的设计和内部。
>>>fromsnatchimportsnatch>>>images=snatch('http://octodex.github.com/pythocat/')>>>images.extensions[u'png']>>>images[1]<Image["pythocat.png"]>>>>images[1].urlu'http://octodex.github.com/images/pythocat.png'
易于使用,易于配置:
>>>url='url/with/54/images'>>>snatch(url)<ImageList[54]># reduce your results by extension:>>>_.with_extension('gif')<ImageList[2]># or more explicitly limit your extension in the inital api call:>>>snatch(url,with_extension=('gif',))<ImageList[2]>
将自己的过滤器或操作挂接到scrap的回调系统中也非常容易。假设您只想捕获宽度大于250像素的图像:
importrequestsimportImagefromStringIOimportStringIOfromsnatchimportsnatchdefwider_than_250(images):deffilter_fn(image):ifimage.widthisNone:res=requests.get(image.src)img=Image.open(StringIO(res.content))image.width=img.size[0]returnimage.width>250returnfilter(filter_fn,images)url='http://octodex.github.com/images/pythocat.png'callbacks={'complete':wider_than_250}images=snatch(url,callbacks=callbacks)
从url下载所有图像更加简单:
importosimportrequestsfromsnatchimportsnatchdirectory='snatched-images'ifnotos.path.exists(directory):os.mkdir(directory)forimageinsnatch('http://octodex.github.com/pythocat/'):contents=requests.get(image.url).contentwithopen('%s/%s'%(directory,image.filename),'w')asimage_file:image_file.write(contents)
发布历史记录
0.1.0(2013-10-12)
- 初始写入/脚手架,需要修复/改进的部分