如何在Scrapy的pipelines.py文件中导入Django模型
我正在尝试在我的 pipelines.py 文件中导入一个 Django 应用的模型,以便使用 Django 的 ORM 来保存数据。我在第一个涉及的 Django 应用 "app1" 中创建了一个 Scrapy 项目 scrapy_project(顺便问一下,这样做合适吗?)。
我在 Scrapy 的设置文件中添加了以下几行:
def setup_django_env(path):
import imp, os
from django.core.management import setup_environ
f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)
setup_environ(project)
current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
setup_django_env(os.path.join(current_dir, '../../d_project1'))
当我尝试导入我的 Django 应用 app1 的模型时,出现了这个错误信息:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in load_object
raise ImportError, "Error loading object '%s': %s" % (path, e)
ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module named dydict.models
为什么 Scrapy 不能访问 Django 应用的模型(考虑到 app1 已经在 installed_app 中)?
2 个回答
在数据处理的流程中,你不需要直接导入Django的模型,而是要使用与Django模型关联的Scrapy模型。
你需要在Scrapy的设置中添加Django的配置,而不是在之后再添加。
如果想在Scrapy项目中使用Django模型,你需要使用django_Item,具体可以参考这个链接:https://github.com/scrapy-plugins/scrapy-djangoitem(要把它导入到你的Python路径中)。
我推荐的文件结构是:
Projects
|-DjangoScrapy
|-DjangoProject
| |-Djangoproject
| |-DjangoAPP
|-ScrapyProject
|-ScrapyProject
|-Spiders
然后在你的Scrapy项目中,你需要添加Django项目的完整路径到Python路径:
**# Setting up django's project full path.**
import sys
sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject')
# Setting up django's settings module name.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings'
接着在你的items.py文件中,你可以把Django模型和Scrapy模型关联起来:
from DjangoProject.models import Person, Job
from scrapy_djangoitem import DjangoItem
class Person(DjangoItem):
django_model = Person
class Job(DjangoItem):
django_model = Job
之后你可以在数据处理的流程中使用.save()方法来保存对象:
spider.py
from scrapy.spider import BaseSpider
from mybot.items import PersonItem
class ExampleSpider(BaseSpider):
name = "example"
allowed_domains = ["dmoz.org"]
start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/']
def parse(self, response):
# do stuff
return PersonItem(name='zartch')
pipelines.py
from myapp.models import Person
class MybotPipeline(object):
def process_item(self, item, spider):
obj = Person.objects.get_or_create(name=item['name'])
return obj
我有一个最简代码的仓库可以参考:(你只需要在Scrapy设置中设置你的Django项目路径)https://github.com/Zartch/Scrapy-Django-Minimal
在这里:https://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py,你需要把我的Django项目路径改成你自己Django项目的路径:
sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb')
试试:
from .. models import MyModel
或者
from ... models import MyModel
每个点代表一个位置