使用python beautiful soup并请求packag时，HTML内容不正确

1条回答

网友

1楼 · 发布于 2024-06-13 20:58:35

在解析损坏的HTML时，不同的解析器将尝试以不同的方式修复损坏的标记；对于如何处理此类错误，没有硬性规定。在

BeautifulSoup可以make use of different parsers，并且每个人都将以不同的方式处理您的内容：

>>> import requests
>>> from bs4 import BeautifulSoup
>>> url = 'http://www.wisdomtree.com/etfs/index-notices.aspx'
>>> html = requests.get(url).content
>>> BeautifulSoup(html, 'html.parser').find('div', class_='col-full')
<div class="col-full">
<p><strong>Index Notifications</strong></p>
<p><p> <br>
<b> March 28, 2014</b>
<br> <br>
# ... cut ...
>>> BeautifulSoup(html, 'lxml').find('div', class_='col-full')
<div class="col-full">
<p><strong>Index Notifications</strong></p>
<p></p><p> <br/>
<b> March 28, 2014</b>
<br/> <br/>
# ... cut ...
>>> BeautifulSoup(html, 'html5lib').find('div', class_='col-full')
<div class="col-full">

            <p><strong>Index Notifications</strong></p>
            <p></p><p> <br/>
<b> March 28, 2014</b>
<br/>  <br/>
# ... cut ...

html5lib解析器是最慢的，但通常会像大多数浏览器一样解析损坏的HTML。lxml和{}都像JSoup一样解析文档的这个特定部分。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python beautiful soup并请求packag时，HTML内容不正确

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >