用beautiful soup从表的1列中提取内容

import requests from bs4 import BeautifulSoup URL = "https://ideas.repec.org/top/top.journals.simple.html" html_content = requests.get(URL).text soup = BeautifulSoup(html_content, "lxml" journal_list = soup.find("table", attrs={"class": "toplist"}) journal_list_data = journal_list.tbody.find_all("tr") headings = [] for td in journal_list_data[0].find_all("td"): headings.append(td.b.text.replace('\n', '').strip()) print(headings)

1条回答

网友

1楼 · 发布于 2024-05-19 00:40:17

发生了什么事？

首先，看看你的汤，这是你的真理——没有

<tbody>也没有<b>可从中获取信息。这就是为什么你会以错误告终

试试这个

import requests
from bs4 import BeautifulSoup

URL = "https://ideas.repec.org/top/top.journals.simple.html"
html_content = requests.get(URL).text
soup = BeautifulSoup(html_content, "lxml")

journal_list = soup.find("table", attrs={"class": "toplist"})
journal_list_data = journal_list.find_all("tr")

headings = []

for td in journal_list_data[0].find_all("td"):
     headings.append(td.text.replace('\n', '').strip())

print(headings)

备选方案get_text()

您还可以使用get_text()获取并剥离元素的文本：

headings.append(td.get_text(strip=True))

解决方案-表中的一列

要仅从一列（例如日记）中获取text，可以执行以下操作：

[journal.find_all("td")[1].get_text(strip=True) for journal in journal_list_data[1:]]

发生了什么事？

解决方案-表中的一列
要仅从一列（例如日记）中获取`text`，可以执行以下操作：
`[journal.find_all("td")[1].get_text(strip=True) for journal in journal_list_data[1:]]`

相关问题更多 >

编程相关推荐

热门问题

热门文章