如何使用scrapy从页面中提取所有href内容

2024-05-23 19:14:19 发布

您现在位置：Python中文网/ 问答频道 /正文

3374

网友

男 | 程序猿一只，喜欢编程写python代码。

我想爬this page。在

我想从一个给定的网站使用Scrapy的所有链接

我正试着这样-

import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html


class ElementSpider(scrapy.Spider):
    name = 'linkdata'

    start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]


    def parse(self, response):

        links = response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href').extract()
        print links

但我什么也没得到。在

Tags： from import div 网站链接 response page js

1条回答

网友

1楼 · 发布于 2024-05-23 19:14:19

我觉得你的xpath有问题。试试这个-

for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):       
            full_url = response.urljoin(href.extract())
            print full_url

希望有帮助：）

祝你好运。。。在

如何使用scrapy从页面中提取所有href内容

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用scrapy从页面中提取所有href内容

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >