如何从BeautifulSoup获取文本，获取

from bs4 import BeautifulSoup import requests import re page = requests.get("https://www.symantec.com/security_response/definitions.jsp?pid=sep14") soup = BeautifulSoup(page.content, 'html.parser') extended = soup.find_all('div', class_='unit size1of2 feedBody') print(extended)

3条回答

网友

1楼 · 编辑于 2024-05-13 19:23:30

我改变了你的代码如下，现在它显示你想要

from bs4 import BeautifulSoup
import requests
import re
page = requests.get("https://www.symantec.com/security_response/definitions.jsp?pid=sep14")
soup = BeautifulSoup(page.content, 'html.parser')
extended = soup.find('div', class_='unit size1of2 feedBody').find_all('li')

print(extended[2])

网友

2楼 · 编辑于 2024-05-13 19:23:30

试试这个吧

from bs4 import BeautifulSoup
import requests
import re
page = requests.get("https://www.symantec.com/security_response/definitions.jsp?pid=sep14")
soup = BeautifulSoup(page.content, 'html.parser')
extended = soup.find('div', class_='unit size1of2 feedBody').findAll('li')

print(extended[2].text.strip())

网友

3楼 · 编辑于 2024-05-13 19:23:30

实际上，您可以使用CSS选择器来实现这一点。这是用漂亮的汤4.7+。在这里，我们的目标与上面所做的div和类相同，但我们也会查找子代li，它是直接子代> strong。然后使用自定义伪类:contains()来确保strong元素包含文本Extended Version:。我们使用select_oneAPI调用，因为它将返回第一个匹配的元素，select将返回列表中所有匹配的元素，但我们只需要一个

一旦有了strong元素，我们就知道下一个同级文本节点有我们想要的信息，所以我们可以使用next_sibling来获取该文本：

from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.symantec.com/security_response/definitions.jsp?pid=sep14")
soup = BeautifulSoup(page.content, 'html.parser')
extended = soup.select_one('div.unit.size1of2.feedBody li:contains("Extended Version:") > strong')
print(extended.next_sibling)

输出

4/18/2019 rev. 7

编辑：正如@QHarr在评论中提到的那样，您很可能只需要一个更简化的strong:contains("Extended Version:")就可以了。记住:contains()搜索给定元素的所有子文本节点，甚至子元素的子文本节点，这一点很重要，因此特定性非常重要。我不会使用:contains("Extended Version:")，因为它会找到div、列表元素等，所以通过指定（至少）strong应该将选择范围缩小到足以满足您的需要

相关问题更多 >

编程相关推荐

热门问题

热门文章