打印相同的HTTPResponse对象会返回不同的输出

def crawl(url): html = getHTML(url) # getHTML() retruns HTTPResponse print(html.read()) # PRINT STATMENT 1 if (html == None): print("Error getting HTML") else: # parse html bsObj = BeautifulSoup(html, "lxml") # print data try: print(bsObj.h1.get_text()) except AttributeError as e: print(e) print(html.read()) # PRINT STAETMENT 2

1条回答

网友

1楼 · 发布于 2024-04-23 14:03:27

html是一个HTTPResponse对象。HTTPResponse支持类似文件的操作，例如read()。你知道吗

就像读取文件时一样，read()消耗可用数据并将文件指针移动到文件/数据的端。随后的read()没有返回任何内容。你知道吗

您有两种选择：

使用seek()方法读取后将文件指针重置为开头：

print(html.read())
html.seek(0) # moves the file pointer to byte 0 relative to the start of the file/data

改为保存结果：

html_body = html.read()
print(html_body)

通常，您会使用第二个选项，因为它更容易重用html_body

相关问题更多 >

编程相关推荐

热门问题

热门文章