在Python中去除嵌套的bbcode引用?
我试着搜索这个问题,但只找到了关于PHP的答案。我在Google App Engine上使用Python,想要去掉嵌套的引号。
举个例子:
[quote user2]
[quote user1]Hello[/quote]
World
[/quote]
我想运行一些代码,只获取最外层的引号。
[quote user2]World[/quote]
2 个回答
3
不太确定你是想要只提取引号里的内容,还是想把整个输入中的嵌套引号都去掉。这个pyparsing的例子可以同时做到这两点:
stuff = """
Other stuff
[quote user2]
[quote user1]Hello[/quote]
World
[/quote]
Other stuff after the stuff
"""
from pyparsing import (Word, printables, originalTextFor, Literal, OneOrMore,
ZeroOrMore, Forward, Suppress)
# prototype username
username = Word(printables, excludeChars=']')
# BBCODE quote tags
openQuote = originalTextFor(Literal("[") + "quote" + username + "]")
closeQuote = Literal("[/quote]")
# use negative lookahead to not include BBCODE quote tags in tbe body of the quote
contentWord = ~(openQuote | closeQuote) + (Word(printables,excludeChars='[') | '[')
content = originalTextFor(OneOrMore(contentWord))
# define recursive definition of quote, suppressing any nested quotes
quotes = Forward()
quotes << ( openQuote + ZeroOrMore( Suppress(quotes) | content ) + closeQuote )
# put separate tokens back together
quotes.setParseAction(lambda t : '\n'.join(t))
# quote extractor
for q in quotes.searchString(stuff):
print q[0]
# nested quote stripper
print quotes.transformString(stuff)
输出结果是:
[quote user2]
World
[/quote]
Other stuff
[quote user2]
World
[/quote]
Other stuff after the stuff