Scrapy:ImagePipline自定义文件命名

2024-04-29 00:13:08 发布

您现在位置:Python中文网/ 问答频道 /正文

在设置.py在

## b3 p0lit3
USER_AGENT = ' *companyname* TUTORIAL BOT - (*myemail*) | No content Generated will be used - For Educational Purpose'
DOWNLOAD_DELAY = 5.0
AUTOTHROTTLE_ENABLED = True
HTTPCACHE_ENABLED = True

BOT_NAME = 'flaticontest'

SPIDER_MODULES = ['flaticontest.spiders']
NEWSPIDER_MODULE = 'flaticontest.spiders'
IMAGES_STORE = '/home/scriptso/Desktop/flattetstn1'
ROBOTSTXT_OBEY = True

ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}

在项目.py在

^{pr2}$

在管道.py在

^{3}$

我的蜘蛛。。。fltSpi.py在

import scrapy
from flaticontest.items import FlaticontestItem

class FltspiSpider(scrapy.Spider):
    name = "fltSpi"
    allowed_domains = ["flaticon.com"]
    start_urls = []

    for num in range(1,2000):
        start_urls.append("http://www.flaticon.com/free-icons/computing_23394/" + str(num))

    def parse(self, response):
        for icon in response.css('.icon'):
            yield {
                'title': icon.css('img').re('title=\"(.*?)\"'),
                'image_urls': icon.css('img').re('set=\"(.*?) 4x'),
                'pach-name': icon.css('li').re('data-pack="(.*)\" '),
                'image_name': icon.css('img').re('title=\"(.*?)\"'),
            }

很难理解管道内衬背后的逻辑,但你能指出我在这里做的错事吗?我肯定问题出在管道里了(很明显)。。。有人愿意给我指点正确的方向吗?!在


更新编辑

更多的故障排除这就是我目前所处的位置。在

在设置.py在

USER_AGENT = 'BASH.SEC TUTORIAL BOT - (bash.sec@multuslegio.net) | No content Generated will be used -Educational Purpose'
DOWNLOAD_DELAY = 5.0
AUTOTHROTTLE_ENABLED = True
HTTPCACHE_ENABLED = True
BOT_NAME = 'flaticontest'
SPIDER_MODULES = ['flaticontest.spiders']
NEWSPIDER_MODULE = 'flaticontest.spiders'
IMAGES_STORE = '/home/scriptso/Desktop/flattetstn1'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
ITEM_PIPELINES = {'flaticontest.pipelines.CustomImageNamePipeline': 1}

在管道.py在

from scrapy.pipelines.images import ImagesPipeline
from scrapy.http import Request


class FlaticontestPipeline(object):
    def process_item(self, item, spider):
        return item

class CustomImageNamePipeline(ImagesPipeline):
    def process_item(self, item, spider):
        def get_media_requests(self, item, info):
            return [Request(x, meta={'image_name': item["title"]})
                    for x in item.get('image_urls', [])]

    def get_images(self, response, request, info):
        for key, image, buf, in super(CustomImageNamePipeline, self).get_images(response, request, info):
            if re.compile('^[0-9,a-f]+.jpg$').match(key):
                key = self.change_filename(key, response)
            yield key, image, buf

    def file_path(self, request, response=None, info=None):
        return '%s.jpg'% request.meta['image_name']

在项目.py在

^{pr2}$

{{spyder}}.py

import scrapy
from flaticontest.items import FlaticontestItem
from flaticontest.pipelines import *

class FltspiSpider(scrapy.Spider):
    name = "fltSpi"
    allowed_domains = ["flaticon.com"]
    start_urls = []

    for num in range(1,2000):
        start_urls.append("http://www.flaticon.com/free-icons/computing_23394/" + str(num))

    def parse(self, response):
        for icon in response.css('.icon'):
            yield {
                'title': icon.css('img').re('title=\"(.*?)\"'),
                'image_urls': icon.css('img').re('set=\"(.*?) 4x'),
                'pach-name': icon.css('li').re('data-pack="(.*)\" '),
                'image_name': icon.css('img').re('title=\"(.*?)\"'),
            }

我觉得我越来越近了。。。。因为不是输出显示这个。。。在

2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/54> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/54> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None

从外观上看,它递归地遍历每个项(输出仅显示页面响应,但很明显它是)。。但是,即使我没有得到我的项目返回的某些原因,我还没有排除,我相信我的管道设置应该是goo重命名下载的图像。。。在


Tags: fromdebugcomnonefreehttpwwwcss