Scrapy管道连接MySQL - 找不到答案

4 投票

3 回答

2910 浏览

提问于 2025-04-17 13:40

我到处找这个问题的答案，但一直找不到。正如我昨天提到的，我对scrapy和python还是个新手，所以可能答案就在那儿，但我就是没明白。

我写了我的爬虫，它运行得很好。这里是我的管道……

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class somepipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='user', 'passwd', 'dbname', 'host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO sometable (title, link, desc)  
                            VALUES (%s, %s)""", 
                           (item['title'].encode('utf-8'), 
                            item['link'].encode('utf-8'),
                            item['desc'].encode('utf-8'))

            self.conn.commit()
        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])
        return item

这是我的设置：

BOT_NAME = 'somebot'

SPIDER_MODULES = ['somespider.spiders']
NEWSPIDER_MODULE = 'somespider.spiders'
ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

但是当我运行这个的时候，出现了一个错误：没有名为pipeline的模块。

我找到过一个类似的答案，但那个是关于图片类的，而我只想要HTML数据。

我哪里做错了？我需要下载其他模块吗？非常感谢帮助。如果我快到了，就给我一点提示吧。

3 个回答

正确的目录路径应该像这样：

myproject/
     scrapy.cfg  
     myproject/
         __init__.py
         items.py
         pipeline.py
         settings.py
         spiders/
            spider.py

另外，你能确认一下你的爬虫工作正常吗？比如说，如果你把 ITEM_PIPELINES 这个设置注释掉，你的爬虫还能正常工作并产生预期的输出吗？

回答于 2025-04-17 由 Python大师

分享举报

这里没有“pipeline”这个文件。应该是“pipelines”。所以你需要把

ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

改成

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

回答于 2025-04-17 由 Python大师

分享举报

Scrapy的教程里有个错别字：应该是'pipelineS'

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

回答于 2025-04-17 由 Python大师

分享举报

Scrapy管道连接MySQL - 找不到答案

3 个回答

撰写回答