在设置.py在
## b3 p0lit3
USER_AGENT = ' *companyname* TUTORIAL BOT - (*myemail*) | No content Generated will be used - For Educational Purpose'
DOWNLOAD_DELAY = 5.0
AUTOTHROTTLE_ENABLED = True
HTTPCACHE_ENABLED = True
BOT_NAME = 'flaticontest'
SPIDER_MODULES = ['flaticontest.spiders']
NEWSPIDER_MODULE = 'flaticontest.spiders'
IMAGES_STORE = '/home/scriptso/Desktop/flattetstn1'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
在项目.py在
^{pr2}$在管道.py在
^{3}$我的蜘蛛。。。fltSpi.py在
import scrapy
from flaticontest.items import FlaticontestItem
class FltspiSpider(scrapy.Spider):
name = "fltSpi"
allowed_domains = ["flaticon.com"]
start_urls = []
for num in range(1,2000):
start_urls.append("http://www.flaticon.com/free-icons/computing_23394/" + str(num))
def parse(self, response):
for icon in response.css('.icon'):
yield {
'title': icon.css('img').re('title=\"(.*?)\"'),
'image_urls': icon.css('img').re('set=\"(.*?) 4x'),
'pach-name': icon.css('li').re('data-pack="(.*)\" '),
'image_name': icon.css('img').re('title=\"(.*?)\"'),
}
很难理解管道内衬背后的逻辑,但你能指出我在这里做的错事吗?我肯定问题出在管道里了(很明显)。。。有人愿意给我指点正确的方向吗?!在
更多的故障排除这就是我目前所处的位置。在
在设置.py在
USER_AGENT = 'BASH.SEC TUTORIAL BOT - (bash.sec@multuslegio.net) | No content Generated will be used -Educational Purpose'
DOWNLOAD_DELAY = 5.0
AUTOTHROTTLE_ENABLED = True
HTTPCACHE_ENABLED = True
BOT_NAME = 'flaticontest'
SPIDER_MODULES = ['flaticontest.spiders']
NEWSPIDER_MODULE = 'flaticontest.spiders'
IMAGES_STORE = '/home/scriptso/Desktop/flattetstn1'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
ITEM_PIPELINES = {'flaticontest.pipelines.CustomImageNamePipeline': 1}
在管道.py在
from scrapy.pipelines.images import ImagesPipeline
from scrapy.http import Request
class FlaticontestPipeline(object):
def process_item(self, item, spider):
return item
class CustomImageNamePipeline(ImagesPipeline):
def process_item(self, item, spider):
def get_media_requests(self, item, info):
return [Request(x, meta={'image_name': item["title"]})
for x in item.get('image_urls', [])]
def get_images(self, response, request, info):
for key, image, buf, in super(CustomImageNamePipeline, self).get_images(response, request, info):
if re.compile('^[0-9,a-f]+.jpg$').match(key):
key = self.change_filename(key, response)
yield key, image, buf
def file_path(self, request, response=None, info=None):
return '%s.jpg'% request.meta['image_name']
在项目.py在
^{pr2}${{spyder}}.py
import scrapy
from flaticontest.items import FlaticontestItem
from flaticontest.pipelines import *
class FltspiSpider(scrapy.Spider):
name = "fltSpi"
allowed_domains = ["flaticon.com"]
start_urls = []
for num in range(1,2000):
start_urls.append("http://www.flaticon.com/free-icons/computing_23394/" + str(num))
def parse(self, response):
for icon in response.css('.icon'):
yield {
'title': icon.css('img').re('title=\"(.*?)\"'),
'image_urls': icon.css('img').re('set=\"(.*?) 4x'),
'pach-name': icon.css('li').re('data-pack="(.*)\" '),
'image_name': icon.css('img').re('title=\"(.*?)\"'),
}
我觉得我越来越近了。。。。因为不是输出显示这个。。。在
2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/54> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/54> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/52> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/50> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None 2017-01-20 12:48:58 [scrapy] DEBUG: Scraped from <200 http://www.flaticon.com/free-icons/computing_23394/48> None
从外观上看,它递归地遍历每个项(输出仅显示页面响应,但很明显它是)。。但是,即使我没有得到我的项目返回的某些原因,我还没有排除,我相信我的管道设置应该是goo重命名下载的图像。。。在
目前没有回答
相关问题 更多 >
编程相关推荐