嵌套div中的xpath问题
我刚接触python和scrapy,现在在控制台里测试用xpath获取网页内容。我用下面的代码成功打印出了h1标题,作为测试。现在我想选取xpath来提取(1)职位名称和(2)职位链接
这是我在控制台里的代码:
r = scrapy.Request(url='https://www.northropgrumman.com/jobs?remote=yes-may-consider-full-time-teleworking-for-this-position&country=united-states-of-america&_job_category=global-supply-chain,business-management,program-management')
fetch(r)
#this works and pulls "Job Search" header at top of page
response.xpath('//h1/text()').getall()
# broken, tried many combos of xpaths to get job title and url
response.xpath("/html/body/div[1]/main/div[2]/div/div/div[3]/div[2]/div/div/div/div/div[1]/div[1]/div/div/div/div/div/div/div[1]/a/text()").getall()
请问在这个页面上,职位名称和职位链接的xpath是什么呢?
1 个回答
0
获取职位名称的XPath可以是:
//div[@class="col-sm-9"]/a/@href
获取职位网址的XPath是:
//div[@class="col-sm-9"]/a/h2/text()
同时获取两者的写法是:
//div[@class="col-sm-9"]/a/@href|//div[@class="col-sm-9"]/a/h2/text()
结果:
href="/jobs/Business-Management/Contract/United-States-of-America/Virginia/Fairfax/R10151186/principal-sr-principal-contract-administrator"
#text "Principal / Sr Principal Contract Administrator"
href="/jobs/Business-Management/Contract/United-States-of-America/California/Sunnyvale/R10153611/principal-senior-principal-contract-administrator-hybrid-or-full-time-remote-schedule"
#text "Principal / Senior Principal Contract Administrator (Hybrid or Full Time Remote Schedule)"
href="/jobs/Business-Management/Multi-Function/United-States-of-America/Maryland/Linthicum/R10150106/principal-pricing-analyst"
#text "Principal Pricing Analyst"
...