Xpath或css选择器

<a rel="nofollow" class="icon-chevron-right " href="/boats-for-sale/condition-used/type-power/class-power-sport-fishing/?year=2006-2014&length=40-65&page=2"><span class="aria-fixes">2</span></a>

def parse(self, response): listing_objs = response.xpath("//div[@class = 'listings-container']/a") for listing in listing_objs: yield response.follow(listing.attrib['href'], callback= self.parse_detail) next_page = response.css("a.icon-chevron-right").attrib['href'] if next_page is not None: yield response.follow(next_page, callback = self.parse)

2条回答

网友

1楼 · 编辑于 2024-05-14 18:14:58

在这种情况下，您可以通过在URL末尾添加&page=#来访问网站的任何页面，这种方法将满足在当前页面被爬网后访问下一页内容的要求。
例如，你可以这样做：

def start_request(self):
    main_url = "https://www.yachtworld.com/boats-for-sale/condition-used/type-power" \
        "/class-power-sport-fishing/?year=2006-2014&length=40-65&page=%(page)s"
    for i in range(pages):
        yield scrapy.Request(main_url % {'page': i}, callback=self.parse)

网友

2楼 · 编辑于 2024-05-14 18:14:58

@Piron上面的答案可能是对页面进行迭代的最简单方法，但您是否仍希望使用Xpath路径：

response.xpath(".//div[@class='search-page-nav']/a[@class='icon-chevron-right']/@href/text()")

其中，search page nav是其他页面链接的父div类，icon chevron right是您要查找的标记的特定类，@href选择该标记的链接，text（）将属性转换为文本。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章