可以打印但无法返回HTML表格：“类型错误：ResultSet对象不是迭代器”

1 投票

2 回答

656 浏览

提问于 2025-04-18 12:57

我是个Python新手，正在使用Python 2.7和beautifulsoup 3.2.1。

我想从一个简单的网页上抓取一个表格。我可以很容易地把它打印出来，但就是无法把它返回到我的视图函数中。

下面的代码可以正常工作：

@app.route('/process')
def process():

   queryURL = 'http://example.com'

   br.open(queryURL)

   html = br.response().read()
   soup = BeautifulSoup(html)
   table = soup.find("table")
   print table

   return 'All good'

我也可以成功地用return html来返回。但是当我尝试用return table代替return 'All good'时，就出现了以下错误：

TypeError: ResultSet object is not an iterator

我还尝试过：

br.open(queryURL)

html = br.response().read()
soup = BeautifulSoup(html)
table = soup.find("table")
out = []
for row in table.findAll('tr'):
    colvals = [col.text for col in row.findAll('td')]
    out.append('\t'.join(colvals))

return table

但都没有成功。有没有什么建议？

迭代器数据提取类型错误网页抓取 beautifulsoup 视图函数 html表格

2 个回答

在@Heinst 指出我试图返回一个对象而不是字符串之后，我还找到了一种更优雅的方法，可以把 BeautifulSoup 对象转换成字符串并返回：

return str(table)

回答于 2025-04-18 由 Python大师

分享举报

你想要返回一个对象，但实际上你并没有获取到这个对象的文本内容，所以你应该使用 return table.text。下面是修改后的完整代码：

def process():

   queryURL = 'http://example.com'

   br.open(queryURL)

   html = br.response().read()
   soup = BeautifulSoup(html)
   table = soup.find("table")
   return table.text

编辑：

现在我明白你想要的是构成网站的HTML代码，而不是值，你可以参考我做的这个例子：

import urllib

url = urllib.urlopen('http://www.xpn.org/events/concert-calendar')
htmldata = url.readlines()
url.close()

for tag in htmldata:
    if '<th' in tag:
        print tag
    if '<tr' in tag:
        print tag
    if '<thead' in tag:
        print tag
    if '<tbody' in tag:
        print tag
    if '<td' in tag:
        print tag

你不能用BeautifulSoup做到这一点（至少我知道的情况是这样），因为BeautifulSoup主要是用来解析或以漂亮的方式打印HTML的。你可以像我做的那样，使用一个循环遍历HTML代码，如果某一行中有标签，就打印出来。

如果你想把输出存储在一个列表中以便后续使用，可以这样做：

htmlCodeList = []
for tag in htmldata:
        if '<th' in tag:
            htmlCodeList.append(tag)
        if '<tr' in tag:
            htmlCodeList.append(tag)
        if '<thead' in tag:
            htmlCodeList.append(tag)
        if '<tbody' in tag:
            htmlCodeList.append(tag)
        if '<td' in tag:
            htmlCodeList.append(tag)

这样可以把HTML行保存到列表的新元素中。所以 <td> 会是索引0，下一组标签会是索引1，依此类推。

回答于 2025-04-18 由 Python大师

分享举报

可以打印但无法返回HTML表格：“类型错误：ResultSet对象不是迭代器”

2 个回答

撰写回答