python-markdown htmlStash 占位符未被替换

4 投票
1 回答
733 浏览
提问于 2025-04-16 23:32

我现在正在开发一个网页应用,使用的是django框架,并且用python-markdown把markdown格式的文本转换成HTML格式。目前有几个情况是markdown处理不了的,所以我写了一些基本的扩展功能。

"""

Helps make paras for Less framework

@div large-column float-left

# This is an H1

this is a paragraph right here!

and a new one

## Heading 2

and yet another one

--> becomes -->

<div class="large-column float left">
    <h1>This is an H1</h1>
    <p>this is a paragraph right here!</p>
    <p>and a new one</p>
    <h2>Heading 2</h2>
    <p>and yet another one</p>
</div>

"""

import re
import markdown

# Global vars

LESS_BLOCK_RE = re.compile( \
    r'@(?P<tag>div|span)[ ]*(?P<class>[a-zA-z0-9-\ ^\n]+)[ ]*\n(?P<inner>.*)(?=div|span)?',
    re.MULTILINE|re.DOTALL
    )

class LessFrameworkExtension(markdown.Extension):

    def extendMarkdown(self, md, md_globals):
        md.registerExtension(self)

        md.preprocessors.add('less_framework', LessBlockPreprocessor(md),'_begin')

    def reset(self):
        print 'resetting'

class LessBlockPreprocessor(markdown.preprocessors.Preprocessor):

    def __init__(self, md):
        markdown.preprocessors.Preprocessor.__init__(self, md)

    def getConfig(self, key):
        if key in self.config:
            return self.config[key][0]
        else:
            return None

    def run(self, lines):
        """ Match and store Less Framework Blocks in the HTML Stash """

        text = "\n".join(lines)

        while 1:
            m = LESS_BLOCK_RE.search(text)
            if m:
                less_tag = m.group('tag')
                less_class = m.group('class')
                less_inner = m.group('inner')

                print less_tag
                print less_class
                print less_inner

                placeholder = self.markdown.htmlStash.store(less_inner, safe=True)
                text = '<%s class="%s">\n%s\n</%s>' % (less_tag, less_class, placeholder, less_tag)
            else:
                break
        return text.split("\n")

    def _escape(self, txt):
        """ basic html escaping """
        txt = txt.replace('&', '&amp;')
        txt = txt.replace('<', '&lt;')
        txt = txt.replace('>', '&gt;')
        txt = txt.replace('"', '&quot;')
        return txt

def makeExtension(configs):
    return LessFrameworkExtension(configs)

上面的扩展功能部分有效,但输出结果是:

<div class="large-column float-left
">
wzxhzdk:0
</div>'

这看起来像是htmlStash存储的占位符。也许我漏掉了对python-markdown的某个调用?查看python-markdown项目中的类似扩展,我发现我的做法是符合规范的。

如果能得到一些帮助,我将非常感激!

示例输入和预期输出

@div large-column float-left

# This is an H1

this is a paragraph right here!

and a new one

## Heading 2

and yet another one

扩展的Markdown --> 变成 --> HTML

<div class="large-column float left">
    <h1>This is an H1</h1>
    <p>this is a paragraph right here!</p>
    <p>and a new one</p>
    <h2>Heading 2</h2>
    <p>and yet another one</p>
</div>

1 个回答

1

我知道这段话是很久以前的,但如果有其他人(像我一样)遇到这个问题并看到这篇帖子,你需要确保在至少经过normalize_whitespace这一步之后再注册预处理器(因为这一步会去掉一些unicode字符,而htmlstash函数正是用这些字符作为分隔符)。

在这种情况下

md.preprocessors.add('less_framework', LessBlockPreprocessor(md),'_begin')

应该是:

md.preprocessors.add('less_framework', LessBlockPreprocessor(md),'>normalize_whitespace')

更多信息请查看这里:https://github.com/Python-Markdown/markdown/issues/222

撰写回答