痒痒的爬行蜘蛛不起作用

2024-05-29 03:03:44 发布

您现在位置:Python中文网/ 问答频道 /正文

不完全确定问题是什么。。。我正在遵循一个教程,在看了几个小时之后,我无法找出我遗漏了什么

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

from socrata.items import SocrataItem

class OpendataSpider(CrawlSpider):
    name = 'opendata_crawl'
    allowed_domains = ['opendata.socrata.com']
    start_urls = ['https://opendata.socrata.com/']
    rules = [
        Rule(
            LinkExtractor(allow='browse\?utf8=%E2%9C%93&page\d*'),
            callback='parse_item',
            follow=True
        )
    ]

    def parse_item(self, response):
        self.logger.info(f'Hi, this is an item page! {response.url}')
        titles = Selector(response).xpath('//div[@class="browse2-result"]')
        for title in titles:
            item = SocrataItem()
            item["text"] = title.xpath('.//div[@class="browse2-result-title"]/h2/a/text()').extract()[0]
            item["url"] = title.xpath('.//div[@class="browse2-result-title"]/h2/a/@href').extract()[0]
            item["views"] = title.xpath('.//div[@class="browse2-result-view-count-value"]/text()').extract()[0].strip()
            yield item

运行时,我得到以下输出:

2020-10-09 15:23:08 [scrapy.core.engine] INFO: Spider opened
2020-10-09 15:23:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-10-09 15:23:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-10-09 15:23:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://opendata.socrata.com/robots.txt> (referer: None)
2020-10-09 15:23:09 [protego] DEBUG: Rule at line 1 without any user agent to enforce it on.
2020-10-09 15:23:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://opendata.socrata.com/> (referer: None)
2020-10-09 15:23:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-10-09 15:23:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

我知道链接是有效的,我知道解析方法是有效的,我不太知道哪里出了问题,但它似乎在尝试任何链接之前就退出了,并且从未触发解析函数

也许是正则表达式,或者是关于代码的设置方式


Tags: fromcoredivinfocomtitlesocrataresult

热门问题