漂亮的汤和桌子刮擦-lxml vs html pars

#! /usr/bin/python from bs4 import BeautifulSoup from urllib import urlopen webpage = urlopen('http://www.thewebpage.com') soup=BeautifulSoup(webpage, "html.parser") table = soup.find('table', {'class' : 'facts_label'}) print table

1条回答

网友

1楼 · 发布于 2024-05-13 20:31:48

在BeautifulSoup文档中有一个称为Differences between parsers的特殊段落，它声明：

Beautiful Soup presents the same interface to a number of different parsers, but each parser is different. Different parsers will create different parse trees from the same document. The biggest differences are between the HTML parsers and the XML parsers.

这种差异在非格式良好的HTML文档上变得很明显。

其寓意是您应该使用在特定情况下工作的解析器。

还要注意，您应该始终显式地指定要使用的解析器。这将帮助您在不同的计算机或虚拟环境上运行代码时避免意外。

编程相关推荐

java JAXB封送字符串，具有xml值，且不转义该值
java ModelMapper转换器不工作
java像HH000412或HCANN000001这样的前缀是什么意思？
验证日期输入修复java。lang.numberformatexception错误
当表具有外键时，java Telosys代码生成失败
如何使所有派生类一起只能实例化一个实例的单例抽象基类？（爪哇）
java如何在非静态服务类中使用广播接收器
java nutch爬虫相对URL问题
使用Jboss DMR下载/保存java附件
Rest模板：无法提取响应：当我们得到xml响应时，没有找到适合响应类型的HttpMessageConverter，没有绑定到JAVA对象

相关问题更多 >

编程相关推荐

热门问题

热门文章

漂亮的汤和桌子刮擦-lxml vs html pars

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >