使用Python抓取本地存储的HTML文件的数据 - 问答 - Python中文网

使用Python抓取本地存储的HTML文件的数据

2024-04-24 20:04:29 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一个很大的Excel文件，每个单元格中都有各种HTML内容，其中包含数据库用户的注释。每个单元格中的内容都是唯一的，长度也各不相同。我需要去掉所有的HTML语法/标记，这样我就可以将这些内容上传到数据库表中。如何使用Python（或者Java，如果Python没有答案的话）获取这些数据？你能提供一个代码示例吗？你知道吗

Tags：文件数据答案代码用户标记数据库示例

1条回答

网友

1楼 · 发布于 2024-04-24 20:04:29

在终端中，pip install bs4。然后可以使用python提取文本，如下所示：

import bs4

for cell in [
    '<html>The indicator lights on the control cabinet&nbsp;are to be replaced with 24Vdc&nbsp;LED\'s. 3 Red &amp;&nbsp;3 Green.</html>',
    '<html><div> <span style=""FONT-SIZE: 18pt"">Close the Monthly LAD and Lanyard Work orders to show they were executed. </span></div>']:
    print(bs4.BeautifulSoup(cell).text.strip())

结果：

The indicator lights on the control cabinet are to be replaced with 24Vdc LED's. 3 Red & 3 Green.
Close the Monthly LAD and Lanyard Work orders to show they were executed.

相关问题更多 >

编程相关推荐

热门问题

热门文章