MediaPipeline不下载fi

2024-05-15 23:01:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我不明白为什么我的管道不保存文件。代码如下:

VIDEOS_DIR = '/home/dmitry/videos'

class VideoDownloadPipeline(MediaPipeline):
    def get_media_requests(self, item, info):
        return Request(item['file'], meta={'item': item})

    def media_downloaded(self, response, request, info):
        item = response.meta.get('item')
        video = response.body
        video_basename = item['file'].split('/')[-1]
        new_filename = os.path.join(VIDEOS_DIR, video_basename)
        f = open(new_filename, 'wb')
        f.write(video)
        f.close()

    def item_completed(self, results, item, info):
        item['file'] = item['file'].split('/')[-1]
        return item

在此之前,我有一些其他代码,但它不是并发的,所以我必须先等待每个视频下载后再继续解析:

^{pr2}$

这是我的settings.py

PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))

BOT_NAME = 'videos_parser'

SPIDER_MODULES = ['videos_parser.spiders']
NEWSPIDER_MODULE = 'videos_parser.spiders'

ITEM_PIPELINES = {
    'videos_parser.pipelines.VideoFileSizePipeline': 300,
    'videos_parser.pipelines.VideoExistingInDBPipeline': 350,
    'videos_parser.pipelines.VideoModeratePipeline': 400,
    'videos_parser.pipelines.VideoDownloadPipeline': 500,
    'videos_parser.pipelines.JsonWriterPipeline': 800,
}

EXTENSIONS = {
    'scrapy.contrib.closespider.CloseSpider': 100,
}

CLOSESPIDER_ITEMCOUNT = 50

DOWNLOAD_TIMEOUT = 60

更新

我添加了一些log.msg()语句,如get_media_requestsmedia_downloaded中的语句,正如我所见,get_media_requests被调用,media_download不是因为:

2014-07-23 08:58:20+0400 [xhamster] DEBUG: Retrying <GET http://somesite/video.mp4> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionLost'>>]

但我可以用浏览器下载这个文件。在


Tags: pathselfinfoparsergetosresponsedef