Python html5lib包_程序模块 - PyPI

基于whatwg html规范的html解析器

html5lib的Python项目详细描述

用法

简单用法如下：

importhtml5libwithopen("mydocument.html","rb")asf:document=html5lib.parse(f)

或：

importhtml5libdocument=html5lib.parse("<p>Hello World!")

默认情况下， 文档 将是一个 xml.etree 元素实例。只要可能，html5lib就会选择加速的元素树。实现（即python 2.x上的 xml.etree.celementree 。

支持另外两种树类型： xml.dom.minidom 和 lxml.etree 。要使用替代格式，请指定树型建筑：

importhtml5libwithopen("mydocument.html","rb")asf:lxml_etree_document=html5lib.parse(f,treebuilder="lxml")

与 urllib2（python 2）一起使用时，http中的字符集应该是按如下方式传递到html5lib：

fromcontextlibimportclosingfromurllib2importurlopenimporthtml5libwithclosing(urlopen("http://example.com/"))asf:document=html5lib.parse(f,transport_encoding=f.info().getparam("charset")) 当与 urllib.request（python 3）一起使用时，来自http的字符集应该按如下方式传递到html5lib中： fromurllib.requestimporturlopenimporthtml5libwithurlopen("http://example.com/")asf:document=html5lib.parse(f,transport_encoding=f.info().get_content_charset()) 要对解析器有更多的控制，请显式地创建一个解析器对象。例如，要使解析器在分析错误时引发异常，请使用： importhtml5libwithopen("mydocument.html","rb")asf:parser=html5lib.HTMLParser(strict=True)document=parser.parse(f) 当显式实例化解析器对象时，传递一个treebuilder 类作为树关键字参数使用替代文档格式： importhtml5libparser=html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))minidom_document=parser.parse("<p>Hello World!") 有关更多文档，请访问https://html5lib.readthedocs.io/" rel="nofollow">https://html5lib.readthedocs.io/

欢迎加入QQ群-->： 979659372

html5lib 1.0.1

html5lib的Python项目详细描述

用法

推荐PyPI第三方库

avocado-epigenome

TwitterCloud

nu_isp

libsarkara

pyvdr

blobworld

django-ssr

tiny-tf

python-ioc

nutrip

pybaco

youtube-batch

cloudberry-netjson

pytesttimeout

py-applescript

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

html5lib 1.0.1

html5lib的Python项目详细描述

用法

推荐PyPI第三方库

avocado-epigenome

TwitterCloud

nu_isp

libsarkara

pyvdr

blobworld

django-ssr

tiny-tf

python-ioc

nutrip

pybaco

youtube-batch

cloudberry-netjson

pytesttimeout

py-applescript

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签