通过:ref:?从ReST文档中提取文本块

3 投票

1 回答

1039 浏览

提问于 2025-04-17 07:33

我有一些用reStructuredText写的文档。我想把里面的一些片段用在在线帮助中。看起来一种方法是通过引用来“剪切”出一些标记，比如：

.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help

我该如何使用python/docutils/sphinx来提取_my_interesting_section这个标记的内容呢？

1 个回答

我不太确定除了创建一个子类并定制Docutils解析器之外，你还可以怎么做。如果你只需要reStructuredText的相关部分，并且不介意失去一些格式，那么你可以尝试使用以下方法。或者，对于特定部分的处理后标记（也就是将reStructuredText转换成HTML或LaTeX）非常容易获取。你可以看看我对这个问题的回答，里面有提取处理后XML部分的例子。如果这正是你想要的，请告诉我。无论如何，下面是具体的方法……

你可以很轻松地使用Docutils来处理reStructuredText。首先，你可以使用Docutils的publish_doctree函数发布reStructuredText的文档树（doctree）表示。这个文档树可以很方便地遍历，并搜索特定的文档元素，比如带有特定属性的部分。搜索特定部分的最简单方法是查看文档树本身的ids属性。doctree.ids其实就是一个字典，里面包含了所有引用和文档相应部分的映射关系。

from docutils.core import publish_doctree

s = """.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help
"""

# Parse the above string to a Docutils document tree:
doctree = publish_doctree(s)

# Get element in the document with the reference id `my-interesting-section`:
ids = 'my-interesting-section'

try:
    section = doctree.ids[ids]
except KeyError:
    # Do some exception handling here...
    raise KeyError('No section with ids {0}'.format(ids))

# Can also make sure that the element we found was in fact a section:
import docutils.nodes
isinstance(section, docutils.nodes.section) # Should be True

# Finally, get section text
section.astext()

# This will print:
# u'About this dialog\n\ntalk about stuff which is relevant in contextual help'

现在格式已经丢失了。如果没有太复杂的内容，可以在上面结果的第一行下面插入一些短横线，这样就能回到你的部分标题。我不太确定对于更复杂的内联格式你需要怎么做。不过希望以上内容能给你一个好的起点。

注意：在查询doctree.ids时，我传递的ids属性和reStructuredText中的定义稍有不同：前面的下划线被去掉了，其他的下划线都被替换成了-。这就是Docutils如何规范化引用的方式。写一个函数将reStructuredText引用转换为Docutils的内部表示其实是非常简单的。否则，我相信如果你深入研究Docutils，你会找到实现这个功能的例程。

回答于 2025-04-17 由 Python大师

分享举报

通过:ref:?从ReST文档中提取文本块

1 个回答

撰写回答