使用BeautifulSoup在和 标记中间刮取数据 - 问答

<td> Home Phone: 507-383-1070 Cell Phone: 507-383-1070 E-Mail: <a href=mailto:macehrhardt@gmail.com>macehrhardt@gmail.com</a> </td>

2条回答

网友

1楼 · 编辑于 2024-05-15 04:50:55

对于您给出的HTML，可以按如下方式提取：

from bs4 import BeautifulSoup

html = """<td>
        <font face="Arial, sans-serif" size="-1">
                    <b>Home Phone: </b>507-383-1070<br>
                    <b>Cell Phone: </b>507-383-1070<br>
                    <b>E-Mail: </b><a href=mailto:macehrhardt@gmail.com>macehrhardt@gmail.com</a><br>
        </font>
</td>"""

soup = BeautifulSoup(html, "html.parser")
entries = [b.next.next for b in soup.find_all('b')][:2]

print entries

给你：

^{pr2}$

网友

2楼 · 编辑于 2024-05-15 04:50:55

可以将soup.find_all与正则表达式一起使用。在

>>> soup.find_all(text=re.compile('\d+(-\d+){2}'))
['507-383-1070', '507-383-1070']

您可能需要根据要提取的电话号码的格式来调整正则表达式。在

使用BeautifulSoup在<b>和<br>标记中间刮取数据

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用BeautifulSoup在<b>和<br>标记中间刮取数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >