Xpath表达式来获取<p>

<div class="et_pb_text_inner"> <h3 style="text-align: center;"><i class="fal fa-ruler-combined"></i><br /> 1672 Square Feet</h3> <p style="text-align: center;"> First Floor 1085 s.f.<br /> Second Floor 587 s.f.<br /> Porches 393 s.f.<br /> Covered Parking 642 s.f.<br /> Storage 187 s.f.<br /> Under Roof 2894 s.f. </p> </div>

import requests from lxml import html resp = requests.get( url="https://tyreehouseplans.com/shop/house-plans/beach-house-plans/crew-cut-house-plan/", headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'} ) tree = html.fromstring(html=resp.text) title = tree.xpath("//div[@class='et_pb_module_inner']/h1/text()")[0] dimensions = tree.xpath("//div[@class='et_pb_text_inner']/p/text()")[0] print(title) print(dimensions)

1条回答

网友

1楼 · 发布于 2024-04-18 10:56:42

text()返回一个文本节点，<p>标记包含6个这样的节点，因此需要删除索引[0]。您还需要提供更精确的xpath来删除不需要的结果

dimensions = tree.xpath("//h3[contains(., '1672 Square Feet')]/following-sibling::p/text()")

将给出一个包含6个字符串的列表

['First Floor 1085 s.f.', '\nSecond Floor 587 s.f.', '\nPorches 393 s.f.', '\nCovered Parking 642 s.f.', '\nStorage 187 s.f.', '\nUnder Roof 2894 s.f.']

相关问题更多 >

编程相关推荐

热门问题

热门文章