获取标题标记的内容，按标题标记进行搜索 - 问答 - Python中文网

获取标题标记的内容，按标题标记进行搜索

2024-04-26 04:00:27 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在刮一页，我必须从这个格式中得到雇员人数：

<h5>Number of Employees</h5>
<p>
            20
</p>

我需要得到数字“20”的问题是，这个数字并不总是在同一个标题，有时是在“h4”和有更多的“h5”标题，所以我需要找到数据，是包含在标题名为：“雇员人数”和摘录的数字，是在包含的段落

这是页面的链接

http://www.bbb.org/chicago/business-reviews/paving-contractors/lester-s-material-service-inc-in-grayslake-il-72000434/

Tags： of 数据 http 标题 number 链接格式数字

2条回答

网友

1楼 · 编辑于 2024-04-26 04:00:27

好吧，最简单的方法是找到一个包含“雇员数”的元素——文本，然后简单地把段落放在后面，假设段落总是紧跟在后面。你知道吗

下面是一段快速而肮脏的代码，可以实现这一点，并打印出数字：

parent = soup.find("div", id='business-additional-info-text')
for child in parent.children:
    if("Number of Employees" in child):
        print(child.findNext('p').contents[0].strip())

网友

2楼 · 编辑于 2024-04-26 04:00:27

'normalize-space(//*[self::h4 or self::h5][contains(., "Number of Employees")]/following-sibling::p[1]/text())'

相关问题更多 >

编程相关推荐

热门问题

热门文章