使用BeautifulSoup获取<li>标签内的数据

<ul> <li class="spacer"> Location: 1500 S. 1st Avenue Yuma, AZ 85364 </li> <li class="spacer"> Phone Number: 928-373-4700 </li> <li class="spacer"> Fax Number: 928-343-8864 </li>

2条回答

网友

1楼 · 编辑于 2024-05-13 21:27:05

它不是关于BeautifulSoup的版本，而是关于differences between underlying parsersBeautifulSoup使用的：

Beautiful Soup presents the same interface to a number of different parsers, but each parser is different. Different parsers will create different parse trees from the same document.

演示：

>>> soup = BeautifulSoup(text, 'html.parser')
>>> print soup.find('li', attrs={'class': 'spacer'})
<li class="spacer"><span>Location:</span> </li>

>>> soup = BeautifulSoup(text, 'html5lib')
>>> print soup.find('li', attrs={'class': 'spacer'})
<li class="spacer"><span>Location:</span> <br/>1500 S. 1st Avenue<br/>Yuma, AZ 85364</li>

>>> soup = BeautifulSoup(text, 'lxml')
>>> print soup.find('li', attrs={'class': 'spacer'})
<li class="spacer"><span>Location:</span> 1500 S. 1st AvenueYuma, AZ 85364</li>

如您所见，不同的解析器-不同的结果。在

{t{1}当你显式指定解析器时^

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser.

网友

2楼 · 编辑于 2024-05-13 21:27:05

我不确定你用的是什么版本。在我的带有beauthoulsoup4.3.2和Py2.7的机器中，输出是

<li class="spacer">Location: 1500 S. 1st AvenueYuma, AZ 85364</li>

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用BeautifulSoup获取<li>标签内的数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >