靓汤[Python]和提取标签中的文本

<table class="bp_ergebnis_tab_info"> <tr> <td> This is a sample text </td> <td> This is the second sample text </td> </tr> </table>

3条回答

网友

1楼 · 编辑于 2024-05-16 04:27:28

先找到桌子（就像你正在做的那样）。使用find而不是findall返回列表中的第一项（而不是返回所有查找的列表-在这种情况下，我们必须添加额外的[0]来获取列表的第一个元素）：

table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

然后再次使用find查找第一个td：

first_td = table.find('td')

然后使用renderContents()提取文本内容：

text = first_td.renderContents()

。。。任务完成了（尽管您可能还想使用strip()删除前导和尾随空格：

trimmed_text = text.strip()

这应该给予：

>>> print trimmed_text
This is a sample text
>>>

如所愿。

网友

2楼 · 编辑于 2024-05-16 04:27:28

我发现漂亮汤非常有效的工具，所以继续学习它：-）它能够解析带有无效标记的页面，因此它应该能够处理您引用的页面。如果要获取具有有效标记的有效重新格式化页源，则可能需要使用命令BeautifulSoup(html).prettify()命令。

至于您的问题，第一个soup.findAll(...)命令的结果也是一个漂亮的Soup对象，您可以在其中进行第二次搜索，如下所示：

table_soup = soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'})
your_sample_text = table_soup.find("td").renderContents().strip()

print your_sample_text

网友

3楼 · 编辑于 2024-05-16 04:27:28

使用“文本”在“td”之间获取文本

1）首先使用标记或ID读取表DOM

soup = BeautifulSoup(self.driver.page_source, "html.parser")
htnm_migration_table = soup.find("table", {'id':'htnm_migration_table'})

2）读取车身

tbody = htnm_migration_table.find('tbody')

3）从车身标签上读取所有tr

trs = tbody.find_all('tr')

4）使用tr获取所有tds

for tr in trs:
      tds = tr.find_all('td')
      for td in tds:
      print(td.text)

相关问题更多 >

编程相关推荐

热门问题

热门文章