使用BeautifulSoup，如何引用HTML页面中的表格行

1 投票

2 回答

1378 浏览

提问于 2025-04-16 02:03

我有一个看起来像这样的html页面：

    <html>

    ..

    <form post="/products.hmlt" ..>
    ..

    <table ...>
    <tr>...</tr>
    <tr>
       <td>part info</td>
    ..
    </tr>

    </table>

    ..


</form>

..

</html>

我尝试了：

form = soup.findAll('form')

table = form.findAll('table')  # table inside form

但是我收到一个错误提示：

ResultSet对象没有'findAll'这个属性

我猜调用findAll并没有返回一个'beautifulsoup'对象？那我该怎么办呢？

更新

这个页面上有很多表格，但只有一个表格是在上面显示的标签里面。

错误处理数据解析网页抓取 HTML beautifulsoup 表格处理

2 个回答

我喜欢ars的回答，也完全同意需要进行错误检查；
特别是如果这个代码要用在实际的生产环境中。

这里有一种更详细、更明确的方法来找到你想要的数据：

from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
    <form><table><tr><td>some text we care about</td></tr>
    <tr><td>more text we care about</td></tr>
    </table></form></html></body>'''    
soup = bs(html)

for tr in soup.form.findAll('tr'):
    print tr.text
# output:
# some text we care about
# more text we care about

这里是整理过的HTML代码：

>>> print soup.prettify()
<html>
 <body>
  <table>
   <tr>
    <td>
     some text
    </td>
   </tr>
  </table>
  <form>
   <table>
    <tr>
     <td>
      some text we care about
     </td>
    </tr>
    <tr>
     <td>
      more text we care about
     </td>
    </tr>
   </table>
  </form>
 </body>
</html>

回答于 2025-04-16 由 Python大师

分享举报

findAll这个函数会返回一个列表，所以你首先要提取出里面的元素：

form = soup.findAll('form')[0]
table = form.findAll('table')[0]  # table inside form

当然，在访问列表中的元素之前，你应该先做一些错误检查（比如确保列表不是空的）。

回答于 2025-04-16 由 Python大师

分享举报

使用BeautifulSoup，如何引用HTML页面中的表格行

2 个回答

撰写回答