Python mwparserfromhell包_程序模块 - PyPI

mwparserfromhell是mediawiki wikicode的解析器。

mwparserfromhell的Python项目详细描述

mwparserfromhell（来自hell的mediawiki解析器）是一个python包它为MediaWiki提供了一个易于使用且功能强大的解析器。维基密码。它支持Python2和Python3。

由Earwig开发，由Σ、Legoktm和其他人贡献。完整的文档可在ReadTheDocs上获得。发展发生在GitHub。

安装

安装解析器的最简单方法是通过Python Package Index；您可以使用^{tt1}安装最新版本$ （get pip）。确保您的pip是最新的，尤其是在windows上。

或者，获取最新的开发版本：

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

您可以使用 python setup.py test -q。

用法

普通用法相当简单（其中text是页面文本）：

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode是一个mwparserfromhell.Wikicode对象，其作用类似于普通的str对象（或者python 2中的unicode）和一些额外的方法。例如：

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

由于节点可以包含其他节点，因此获取嵌套模板很简单：

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

您还可以将recursive=False传递给filter_templates()并探索手动创建模板。这是可能的，因为节点可以包含 Wikicode对象：

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

可以方便地修改模板以添加、删除或更改参数。Wikicode 对象可以像列表一样处理，使用append()，insert()， remove()、replace()等。它们还有一个matches()方法用于比较页或模板名称，它负责大写和空白：

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

然后可以将code转换回常规的str对象（用于保存页面！）通过在上面调用str()：

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

同样，在python 2中使用unicode(code)。

限制

而mediawiki解析器生成html并可以访问模板，除其他外，mwparserfromhell充当仅限源代码。这有几个含义：

无法检测模板转换生成的语法元素。为了例如，假设一个包含文本</b>。而mediawiki会正确地理解 <b>foobar{{end-bold}}转换为<b>foobar</b>，mwParserFromhell 无法检查{{end-bold}}的内容。相反，它会将粗体标记视为未完成标记，可能会向下延伸。
与外部链接相邻的模板，如http://example.com{{foo}}，被认为是连接的一部分。实际上，这取决于模板的内容。
当不同的语法元素相互交叉时，如 {{echo|''Hello}},world!''，解析器会感到困惑，因为这不能用普通语法树表示。相反，解析器将处理第一个语法构造为纯文本。在这种情况下，只有斜体标记会正确分析。
解决方法：，因为这通常与文本格式和文本一起出现格式化通常对用户不感兴趣，您可以通过 skip_style_tags=true到mwparserfromhell.parse()。这对待'' 以及'''作为纯文本。
mwparserfromhell的未来版本可能包括多个解析模式更理智地绕过这个限制。

此外，解析器对某些特定于wiki的设置缺乏认识：

Word-ending links不受支持，因为linktrail规则是特定语言。
无法识别本地化命名空间名称，因此文件链接（例如 [[File:...]]）被视为常规wikilinks。
任何看起来像XML标记的内容都被视为标记，即使它不是可识别的标记名，因为有效标记的列表取决于加载的mediawiki 分机。

集成

mwparserfromhell由EarwigBot使用，最初是为EarwigBot开发的； Page对象有一个parse方法，该方法本质上调用 mwparserfromhell.parse()在page.get()上。

如果你是重新使用Pywikibot，您的代码可能如下所示：

importmwparserfromhellimportpywikibotdefparse(title):site=pywikibot.Site()page=pywikibot.Page(site,title)text=page.get()returnmwparserfromhell.parse(text)

如果不使用库，可以使用以下命令解析任何页面 python 3代码（通过API）：

importjsonfromurllib.parseimporturlencodefromurllib.requestimporturlopenimportmwparserfromhellAPI_URL="https://en.wikipedia.org/w/api.php"defparse(title):data={"action":"query","prop":"revisions","rvprop":"content","rvslots":"main","rvlimit":1,"titles":title,"format":"json","formatversion":"2"}raw=urlopen(API_URL,urlencode(data).encode()).read()res=json.loads(raw)revision=res["query"]["pages"][0]["revisions"][0]text=revision["slots"]["main"]["content"]returnmwparserfromhell.parse(text)

欢迎加入QQ群-->： 979659372

mwparserfromhell 0.5.4

mwparserfromhell的Python项目详细描述

安装

用法

限制

集成

推荐PyPI第三方库

forkheart

calautograder

pearlcli

there-is-no-spoon

dsbox

pyDLO

emqx-exproto

omfit-omas

subuid

circup

getube

lidazhisheng

wsgi_tracer

cs-prob

Geode-Conversion

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

mwparserfromhell 0.5.4

mwparserfromhell的Python项目详细描述

安装

用法

限制

集成

推荐PyPI第三方库

forkheart

calautograder

pearlcli

there-is-no-spoon

dsbox

pyDLO

emqx-exproto

omfit-omas

subuid

circup

getube

lidazhisheng

wsgi_tracer

cs-prob

Geode-Conversion

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签