Python mediawiki-parser包_程序模块 - PyPI

基于pijnu的mediawiki语法分析器。

mediawiki-parser的Python项目详细描述

演示文稿

这是mediawiki（mw）语法的解析器。它的目标是将wikitext转换为抽象语法树（ast），然后将该ast呈现为各种格式，如纯文本和html。

这是彼得·波特罗和他的导师埃里克·罗斯在2011年谷歌代码之夏的原创作品。

要求

这个解析器依赖于pijnu。您必须安装最新版本的pijnu，网址为：https://github.com/peter17/pijnu

不要使用http://spir.wikidot.com中的版本，该版本已过时。

对于基本和简单的安装，请尝试：

pip install mediawiki-parser

工作原理

两个文件preprocessor.pijnu和mediawiki.pijnu使用形成语法的模式描述mw语法。另一个名为pijnu的python工具将解释这些语法并使用它们来匹配wikitext内容并构建ast。

然后，特定的python函数将ast的叶子呈现为所需的格式。

我们使用两个语法的原因是，在实际解析页面内容之前，我们将首先用预处理器替换wikitext中的模板。

构建解析器

预处理器和mediwiki解析器必须从pijnu构建语法，然后才能使用mediawiki解析器。你可以通过 setup.py，可能将pythonpath设置为指向pijnu:

cd /PATH/TO/mediawiki-parser/
env PYTHONPATH=/PATH/TO/pijnu python setup.py build_parsers

如何测试

当前测试该工具的最简单方法是将wikitext放在wikitext.txt文件中。然后，运行：

python parser.py

wikitext将在article.htm文件中呈现为html。

今后还可能采取其他办法。

单元测试

安装机头并运行：

cd /PATH/TO/mediawiki-parser/
env PYTHONPATH=/PATH/TO/pijnu/ nosetests tests

如何在程序中使用

html

示例

要使用此工具在Python程序中将WikiText呈现为HTML，可以使用以下行：

templates = {}
allowed_tags = []
allowed_self_closing_tags = []
allowed_attributes = []
interwiki = {}
namespaces = {}

from mediawiki_parser.preprocessor import make_parser
preprocessor = make_parser(templates)

from mediawiki_parser.html import make_parser
parser = make_parser(allowed_tags, allowed_self_closing_tags, allowed_attributes, interwiki, namespaces)

preprocessed_text = preprocessor.parse(source)
output = parser.parse(preprocessed_text.leaves())

字符串将包含呈现的html。您应该通过填充第一行的变量来描述期望的行为：

如果wikitext调用外部模板，请将它们的名称和内容放入templatesdict（例如：{'my template': 'my template content'}）
如果wiki允许某些html标记，请将其列在allowed_tags列表中（例如：['center', 'big', 'small', 'span']；出于安全原因，请避免'script'和其他一些标记）
如果wiki上允许某些自动关闭的html标记，请将其列在allowed_self_closing_tags列表中（例如：['br', 'hr']；出于安全原因，请避免'script'和其他一些标记）
如果wiki允许某些html标记，请列出它们可以使用allowed_attributes列表的属性（例如：['style', 'class']；出于安全原因，避免'onclick'和其他一些属性）
如果您想使用interwiki链接，请在interwikidict中列出外部wiki（例如：{'fr': 'http://fr.wikipedia.org/wiki/'}）
如果您想区分标准链接、文件包含或类别，请在namespacesdict中列出wiki的名称空间（例如：{'Template': 10, 'Category': 14, 'File': 6}，其中数字是mw中使用的名称空间代码）

文本示例

要使用此工具在Python程序中将WikiText呈现为文本，可以使用以下行：

templates = {}

from mediawiki_parser.preprocessor import make_parser
preprocessor = make_parser(templates)

from mediawiki_parser.text import make_parser
parser = make_parser()

preprocessed_text = preprocessor.parse(source)
output = parser.parse(preprocessed_text.leaves())

output字符串将包含呈现的文本。如果wikitext调用外部模板，请将它们的名称和内容放入templatesdict（例如：{'my template': 'my template content'}）

模板替换示例

如果只想替换给定WikiText中的模板，只需调用预处理器而不调用呈现后处理器：

templates = {}

from mediawiki_parser.preprocessor import make_parser
preprocessor = make_parser(templates)

output = preprocessor.parse(source)

output字符串将包含呈现的wikitext。将模板名称和内容放入templatesdict（例如：{'my template': 'my template content'}）

后处理器

解析器生成ast。为了专业提供三个后置处理器，提供可读输出：

html.py，用于HTML输出
text.py，用于文本输出
raw.py，用于原始输出

目前，我们主要关注html后处理器。文本输出可能没有预期的干净。

你可以根据自己的需要调整它们。

已知错误

此工具应该能够将任何WikiText页面呈现为文本或HTML。

请毫不犹豫地报告您在使用此工具时可能发现的错误。

特别感谢

To Nicholas Burlett for his directory restructure, performance improvements and other fixes

欢迎加入QQ群-->： 979659372

mediawiki-parser 0.4.1

mediawiki-parser的Python项目详细描述

演示文稿

要求

工作原理

构建解析器

如何测试

单元测试

如何在程序中使用

html

文本示例

模板替换示例

后处理器

已知错误

特别感谢

推荐PyPI第三方库

lenticrypt

writeasapi

hesong-utils

mstatistics

django-voximplant

initrd

python-rocket-league

zengine

s3-sessions

grande-ojuara-pypi

cloudshell-sdn-odl

django-seeker

socialite-facebook

motorise

snetd-alpha

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

mediawiki-parser 0.4.1

mediawiki-parser的Python项目详细描述

演示文稿

要求

工作原理

构建解析器

如何测试

单元测试

如何在程序中使用

html

文本示例

模板替换示例

后处理器

已知错误

特别感谢

推荐PyPI第三方库

lenticrypt

writeasapi

hesong-utils

mstatistics

django-voximplant

initrd

python-rocket-league

zengine

s3-sessions

grande-ojuara-pypi

cloudshell-sdn-odl

django-seeker

socialite-facebook

motorise

snetd-alpha

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签