python html/xml解析器,便于web抓取。

pyDHTMLParser的Python项目详细描述


https://badge.fury.io/py/pyDHTMLParser.pnghttps://img.shields.io/pypi/dm/pyDHTMLParser.svghttps://readthedocs.org/projects/pyDHTMLParser/badge/?version=latesthttps://img.shields.io/github/issues/Bystroushaak/pyDHTMLParser.svghttps://img.shields.io/pypi/l/pyDHTMLParser.svg

这是什么?

dhtmlparser是一个轻量级的html/xml解析器,创建它的目的只有一个-简单快捷 从dom中选择标记。

当你需要为某个网页或刮刀编写自己的“游击”api时,它会非常有用。

如果需要,还可以比连接字符串更容易地创建html/xml文档。

文档

完整的模块文档可以在这里找到:http://pyDHTMLParser.rtfd.org

更改日志

2.2.2

  • Attempt to fix strange recursive inheritance problem.

2.2.0

  • Rewritten for compatibility with python3.

2.1.0-2.1.8

  • State parser fixed - it can now recover from invalid html like ^{tt1}$.
  • Rewritten to use ^{tt2}$ in parser for better readability.
  • Garbage collector is now disabled during _raw_split().
  • Fixed #16 - recovery after tags which don’t ends with ^{tt3}$ (^{tt4}$ for example).
  • Closed #17 - implementation of ignoring of ^{tt5}$ in usage as is smaller than sign.
  • Restored support of multiline attributes.
  • ^{tt6}$ now doesn’t try to parse HTML element parameters.
  • Implemented ^{tt7}$ getter.
  • License changed to MIT.
  • Fixed #18: bug which in some cases caused invalid output.
  • Added HTMLElement.__repr__().
  • Added test_coverage.sh.
  • Added extended test_equality() coverage.
  • Formatting improvements.
  • Improved constructor handling, which is now much more readable.
  • Updated formatting of the setup.py.
  • Added more tests.
  • Fixed #22; bug in the SpecialDict.
  • Fixed some nasty unicode problems.
  • Fixed python 2 / 3 problem in docs/__init__.py.
  • getVersion() -> get_version().

2.0.10

  • Added more tests of removeTags().
  • run_tests.sh now gets arguments.
  • Check for string in removeTags() changed to basestring from str.

2.0.6-2.0.9

  • Fixed behaviour of toString() and tagToString().
  • SpecialDict is now derived from OrderedDict.
  • Changed and added tests of .params attribute (OrderedDict is now used).
  • Fixed bug in _repair_tags().
  • Removed _repair_tags() - it wasn’t really necessary.
  • Fixed nasty bug which could cause invalid XML output.

2.0.1-2.0.5

  • Fixed bugs in ^{tt8}$.
  • Fixed broken links in documentation.
  • Fixed bugs in ^{tt9}$.
  • ^{tt10}$; Fixed bug which prevented tag_name to be None.
  • Added op ^{tt11}$ to the SpecialDict.
  • Added new method ^{tt12}$ to ^{tt13}$.

2.0.0

  • Rewritten, refactored, splitted to multiple files.
  • Added unittest coverage of almost 100% of the code.
  • Added better selector methods (^{tt14}$, ^{tt15}$)
  • Added Sphinx documentation.
  • Fixed a lot of bugs.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
如果SQL server执行计划是流式传输数据,那么调用next()时Java ResultSet对象将如何反应?   java为什么水平滚动条永远不会出现在JTable上?   java如何在Android项目中添加SDK模块   绿脚:爪哇。lang.ClassCastException:无法将障碍物投射到球投射物   xslt需要在Java1.4.2中使用SaxonB(版本9.1.0.8)方面的帮助   java Apache HTTP客户端下载SOAP响应的一部分   为Java 9计划的“过滤传入序列化数据”解决了数据反序列化安全漏洞吗?   java如何使用drools对列表中的元素进行分组   java从匿名类到lambda表达式   Java中是否有不可变的单链表实现?   java动态添加按钮以滚动查看安卓   java GAE:无法为模块http请求设置管理员登录凭据   java如何在Hibernate 5 CriteriaBuilder中使用条件条件获取计数   java如何将JSON+HAL响应解析为POJO   java如何获取计算器程序,对两个以上的数字进行加、减、乘、除   java Thymeleaf+静态资源+@ResponseBody