ScrapyCan不导入项目到我的蜘蛛(没有模块名称行为项目)

2024-04-19 19:48:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的刮痧和运行蜘蛛爬行behance

import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse

from scrapy.crawler import CrawlerProcess

class DmozSpider(scrapy.Spider):
    name = "behance"
    #allowed_domains = ["behance.com"]
    start_urls = [

        "https://www.behance.net/gallery/29535305/Mind-Your-Monsters",


    ]


    def __init__ (self):
        self.driver = webdriver.Firefox()

    def parse(self, response):

            self.driver.get(response.url)
            response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
            item = BehanceItem()
            hxs = Selector(response)

            item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()

            yield   item

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(DmozSpider)
process.start()

当我运行爬虫程序时,我在命令行上遇到以下错误

回溯(最近一次呼叫): File“/home/davy/behance/behance/spiders/behance”_蜘蛛网.py“,第3行,英寸 从行为项目导入行为项

重要错误:没有命名的模块行为项目

我的目录结构:

^{pr2}$

Tags: fromimageimportselfprojecturlresponsedriver
2条回答

尝试使用以下命令运行蜘蛛:

scrapy crawl behance

或者更改蜘蛛文件:

^{pr2}$

并在settings.py文件所在的目录中创建另一个python文件。在

run.py

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl("behance")
process.start()

现在像运行普通python脚本一样运行这个文件。python run.py

可以将其添加到python路径:

export PYTHONPATH=$PYTHONPATH:/home/davy/behance/

相关问题 更多 >