创建列表词法分析器/解析器

2 投票

4 回答

925 浏览

提问于 2025-04-17 10:16

我需要创建一个词法分析器/解析器，能够处理长度和结构都不固定的输入数据。

比如，我有一份保留关键字的列表：

keyWordList = ['command1', 'command2', 'command3']

还有一个用户输入的字符串：

userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command 3'
userInputList = userInput.split()

我该如何编写这个函数：

INPUT:

tokenize(userInputList, keyWordList)

OUTPUT:
[['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command 2', ['the', 'lazy', 'dog'], 'command3']

我已经写了一个可以识别关键字的分词器，但一直没能找到一个有效的方法，把那些不是关键字的内容分组放到更深一层的列表里。

欢迎提供正则表达式的解决方案，但我真的想看看底层的算法，因为我可能会把这个应用扩展到其他对象的列表，而不仅仅是字符串。

正则表达式数据结构解析器词法分析算法设计关键字识别分词器

4 个回答

这很简单，只需要用一些正则表达式就可以做到：

>>> reg = r'(.+?)\s(%s)(?:\s|$)' % '|'.join(keyWordList)
>>> userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command3'
>>> re.findall(reg, userInput)
[('The quick brown', 'command1'), ('fox jumped over', 'command2'), ('the lazy dog', 'command3')]

现在你只需要把每个元组的第一个元素分开就行了。

如果层级比较深，正则表达式可能就不太适用了。

在这个页面上有一些不错的解析器可以选择：http://wiki.python.org/moin/LanguageParsing

我觉得Lepl是一个不错的选择。

回答于 2025-04-17 由 Python大师

分享举报

像这样：

def tokenize(lst, keywords):
    cur = []
    for x in lst:
        if x in keywords:
            yield cur
            yield x
            cur = []
        else:
            cur.append(x)

这个会返回一个生成器，所以你需要把它放在一个 list 里来使用。

回答于 2025-04-17 由 Python大师

分享举报

试试这个：

keyWordList = ['command1', 'command2', 'command3']
userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command3'
inputList = userInput.split()

def tokenize(userInputList, keyWordList):
    keywords = set(keyWordList)
    tokens, acc = [], []
    for e in userInputList:
        if e in keywords:
            tokens.append(acc)
            tokens.append(e)
            acc = []
        else:
            acc.append(e)
    if acc:
        tokens.append(acc)
    return tokens

tokenize(inputList, keyWordList)
> [['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command2', ['the', 'lazy', 'dog'], 'command3']

回答于 2025-04-17 由 Python大师

分享举报

创建列表词法分析器/解析器

4 个回答

撰写回答