刮文章网页后如何强制刮相应的评论网页？

1条回答

网友

1楼 · 发布于 2024-04-26 23:49:48

您需要手动设置对评论页的请求。
爬行蜘蛛发现的每个文章页面都应该在某个地方有一个评论页面url，对吗？
在这种情况下，您可以简单地在parse_article()方法中链接一个审阅页请求。你知道吗

from scrapy import Request
from scrapy.spiders import CrawlSpider
class MySpider(CrawlSpider):

    rules = (
        Rule(LinkExtractor(allow = r'\/article\d+\/$'),   callback="parse_articles"),
    )
    comments_le = LinkExtractor(allow = r'\/article\d+\/comments\/$')

    def parse_article(self, response):
        item = dict()
        # fill up your item
        ...
        # find comments url
        comments_link  = comments_le.extract_links()[0].link
        if comments_link:
            # yield request and carry over your half-complete item there too
            yield Request(comments_link, self.parse_comments,
                          meta={'item':item})
        else:
            yield item 

    def parse_comments(self, response):
        # retrieve your half-complete item
        item = response.meta['item']
        # add some things to your item
        ...
        yield item

相关问题更多 >

编程相关推荐

热门问题

热门文章

刮文章网页后如何强制刮相应的评论网页？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >