Craigslist纸条

import scrapy from scrapy.spiders import Rule,CrawlSpider from scrapy.linkextractors import LinkExtractor class CraigspiderSpider(CrawlSpider): name = "CraigSpider" allowed_domains = ["http://losangeles.craigslist.org"] start_urls = ( 'http://losangeles.craigslist.org/search/cpg/', ) rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), callback="parse_page", follow= True),) def parse_page(self, response): items = response.selector.xpath("//p[@class='row']") for i in items: link = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract() title = i.xpath("./span[@class='txt']/span[@class='pl']/a/span[@id='titletextonly']/text()").extract() print link,title

1条回答

网友

1楼 · 发布于 2024-06-06 06:56:44

根据您粘贴的代码，parse_page：

不会返回/产生任何结果，以及
只包含一行：“items=响应选择器“…”

上面#2的原因是for循环没有正确缩进。在

尝试缩进for循环：

class CraigspiderSpider(CrawlSpider):
    name = "CraigSpider"
    allowed_domains = ["http://losangeles.craigslist.org"]
    start_urls = ('http://losangeles.craigslist.org/search/cpg/',)

    rules = (Rule(
        LinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)),
        callback="parse_page", follow= True))

    def parse_page(self, response):
        items = response.selector.xpath("//p[@class='row']")

        for i in items:
            link = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract()
            title = i.xpath("./span[@class='txt']/span[@class='pl']/a/span[@id='titletextonly']/text()").extract()
            print link, title
            yield dict(link=link, title=title)

相关问题更多 >

编程相关推荐

热门问题

热门文章