不完全确定问题是什么。。。我正在遵循一个教程,在看了几个小时之后,我无法找出我遗漏了什么
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from socrata.items import SocrataItem
class OpendataSpider(CrawlSpider):
name = 'opendata_crawl'
allowed_domains = ['opendata.socrata.com']
start_urls = ['https://opendata.socrata.com/']
rules = [
Rule(
LinkExtractor(allow='browse\?utf8=%E2%9C%93&page\d*'),
callback='parse_item',
follow=True
)
]
def parse_item(self, response):
self.logger.info(f'Hi, this is an item page! {response.url}')
titles = Selector(response).xpath('//div[@class="browse2-result"]')
for title in titles:
item = SocrataItem()
item["text"] = title.xpath('.//div[@class="browse2-result-title"]/h2/a/text()').extract()[0]
item["url"] = title.xpath('.//div[@class="browse2-result-title"]/h2/a/@href').extract()[0]
item["views"] = title.xpath('.//div[@class="browse2-result-view-count-value"]/text()').extract()[0].strip()
yield item
运行时,我得到以下输出:
2020-10-09 15:23:08 [scrapy.core.engine] INFO: Spider opened
2020-10-09 15:23:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-10-09 15:23:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-10-09 15:23:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://opendata.socrata.com/robots.txt> (referer: None)
2020-10-09 15:23:09 [protego] DEBUG: Rule at line 1 without any user agent to enforce it on.
2020-10-09 15:23:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://opendata.socrata.com/> (referer: None)
2020-10-09 15:23:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-10-09 15:23:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
我知道链接是有效的,我知道解析方法是有效的,我不太知道哪里出了问题,但它似乎在尝试任何链接之前就退出了,并且从未触发解析函数
也许是正则表达式,或者是关于代码的设置方式
目前没有回答
相关问题 更多 >
编程相关推荐