Python frag2text包_程序模块 - PyPI

选择并反转标记（html2text）网页片段。

frag2text的Python项目详细描述

frag2文本

https://img.shields.io/pypi/v/frag2text.svg

Markdown为您提供纯文本的html，并html2text反转过程。如果您想要的纯文本版本的特定节一个网页（一个html片段），您通常会选择（解析）首先，然后生成标记文本以保留一些格式化。

我把frag2text制作成：

通过css选择器或xpath轻松选择网页片段表达式
获取片段的纯文本，并保留以后与降价一起使用
不向另一个程序（如lynx-dump）提供外壳
不直接分析HTML或文本
使用html5lib进行健壮的解析
有一个简单的易于维护的python模块（封装将解析器、treebuilders和序列化程序，保持简单-这不需要美组）

这个问题我已经解决了一段时间不同的方式。它看起来微不足道，但很快就会变得可笑。如果你有什么建议或者想和别人分享你的经验工具，请告诉我。

安装

$ pip install frag2text

用法

Python

>>>fromfrag2textimportfrag2text>>>help(frag2text)Helponfunctionfrag2textinmodulefrag2text:frag2text(endpoint,stype,selector,clean=False,raw=False,verbose=False)returnsMarkdowntextofselectedfragment.Args:endpoint:URL,file,orHTMLstringstype:{'css'|'xpath'}selector:CSSselectororXPathexpressionReturns:MarkdowntextOptions:clean:cleansfragment(lxml.html.cleandefaults)raw:returnsrawHTMLfragmentverbose:showhttpstatus,encoding,headers

外壳

$ frag2text.py -h
usage: frag2text.py [-h][-c][-r][-v] endpoint {css,xpath} selector

reverse Markdown (html2text) HTML fragments.

positional arguments:
  endpoint       URL, file, or HTML string
  {css,xpath}    fragment selector type
  selector       CSS select statement or XPath expression

optional arguments:
  -h, --help     show this help message and exit
  -c, --clean    clean fragment (lxml.html.clean defaults)
  -r, --raw      output raw fragment
  -v, --verbose  print status, encoding, headers

示例

Python

fromfrag2textimportfrag2textinfo=frag2text('http://wikipedia.org/wiki/Amanita','css','.infobox')

外壳

$ frag2text.py "<ht?+><borkt><h1>hello" xpath //h1
...
# hello

css选择

$ frag2text.py http://wikipedia.org/wiki/Amanita css .infobox
_Amanita_
---
![Fliegenpilz-1.jpg](//upload.wikimedia.org/wikipedia/commons/thumb/d/d1
/Fliegenpilz-1.jpg/230px-Fliegenpilz-1.jpg)
_[Amanita muscaria](/wiki/Amanita_muscaria)_
Albin Schmalfuß, 1897[Scientific classification](/wiki/Biological_classification)
Kingdom: |[Fungi](/wiki/Fungi)
Division: |[Basidiomycota](/wiki/Basidiomycota)
Class: |[Agaricomycetes](/wiki/Agaricomycetes)
Order: |[Agaricales](/wiki/Agaricales)
Family: |[Amanitaceae](/wiki/Amanitaceae)
Genus: | _**Amanita**_
[Pers.](/wiki/Christian_Hendrik_Persoon)(1794)[Type species](/wiki/Type_species)
_[Amanita muscaria](/wiki/Amanita_muscaria)_
([L.](/wiki/Linnaeus))[Lam.](/wiki/Lam.)(1783)[Diversity](/wiki/Biodiversity)[c.600 species](/wiki/List_of_Amanita_species)

xpath表达式

$ frag2text.py http://en.wikipedia.org/wiki/Amanita xpath '//p[1]'

The [genus](/wiki/Genus) _**Amanita**_ contains about 600[species](/wik
i/Species) of [agarics](/wiki/Agarics) including some of the most [toxic
](/wiki/Toxic) known [mushrooms](/wiki/Mushrooms) found worldwide, as we
ll as some well-regarded edible species. This genus is responsible for a
pproximately 95% of the fatalities resulting from [mushroom poisoning](/
wiki/Mushroom_poisoning), with the [death cap](/wiki/Death_cap) accounti
ng for about 50% on its own. The most potent toxin present in these mush
rooms is α[-amanitin](/wiki/%CE%91-amanitin).

发布历史

0.0.5（2015-02-18）

处理xpathevalerror、selector或syntaxerror，但未找到任何结果。
所选片段的联接列表，只返回第一个。

<>不要过早退出错误。

0.0.1（2015-01-14）

好像有用！

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
dolmantle
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
如何在Excel中读取公式并将其转换为Python中的计算？
如何在excel中读取嵌入的excel，并将嵌入文件中的信息存储在主excel文件中？
如何在Excel中返回未知列长度的非空顶行列值？
如何在excel中选择数据列？
如何在Excel中通过脚本自动为一列中的所有单元格创建公共别名
如何在excel中高效格式化范围AttributeError:“tuple”对象没有属性“fill”
如何在excel单元格中编写python函数
如何在excel单元格中自动执行此python代码？
如何在excel工作表中创建具有相应值的新列
如何在Excel工作表中复制条件为单元格颜色的python数据框？
如何在Excel工作表中循环
如何在excel工作表中打印嵌套词典？
如何在excel工作表中绘制所有类的继承树？
如何在Excel工作表中自动调整列宽？
如何在excel工作表中追加并进一步处理

frag2text 0.0.6

frag2text的Python项目详细描述

frag2文本

安装

用法

Python

外壳

示例

Python

外壳

css选择

xpath表达式

发布历史

0.0.5（2015-02-18）

0.0.1（2015-01-14）

推荐PyPI第三方库

datanomiq-alien

binaryrpc

cloudshell-sdn-odl

python-rake

padua

Py2600

keras-retinanet

flagz

patchworkclient

oceanlib

odoo8-addons-oca-pos

metricserverremote

kdcount

djangoenviron

zhanglan

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

frag2text 0.0.6

frag2text的Python项目详细描述

frag2文本

安装

用法

Python

外壳

示例

Python

外壳

css选择

xpath表达式

发布历史

0.0.5（2015-02-18）

0.0.1（2015-01-14）

推荐PyPI第三方库

datanomiq-alien

binaryrpc

cloudshell-sdn-odl

python-rake

padua

Py2600

keras-retinanet

flagz

patchworkclient

oceanlib

odoo8-addons-oca-pos

metricserverremote

kdcount

djangoenviron

zhanglan

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签