如何保存正则表达式用户输入值（Python）

0 投票

2 回答

1232 浏览

提问于 2025-04-17 12:30

我正在用Python制作一个简单的聊天机器人。它有一个文本文件，里面存储了一些正则表达式，这些表达式帮助生成机器人的回复。用户输入和机器人的输出之间用一个|符号分隔。

my name is (?P<'name'>\w*) | Hi {'name'}!

这个方法对于单一的输入和输出响应是有效的，但我希望机器人能够记住用户输入的正则表达式值，然后再使用它们（也就是说，给机器人一个“记忆”）。比如，我想让机器人记住用户输入的“名字”，这样我就可以在规则中使用这个值：

my name is (?P<'word'>\w*) | You said your name is {'name'} already!
my name is (?P<'name'>\w*) | Hi {'name'}!

当还没有“名字”的值时，机器人会先输出“嗨，史蒂夫”，一旦机器人有了这个值，“单词”规则就会生效。我不确定根据我现在的程序结构，这个是否容易实现。我已经把文本文件变成了一个字典，字典中的键和值用|符号分开。当用户输入一些文本时，程序会比较用户输入的内容是否和字典中存储的输入匹配，然后打印出相应的机器人回复（如果没有匹配的情况，还有一个“否则”的处理）。

在比较的过程中，我需要做点什么，以便保存用户的正则表达式文本，然后以某种方式再放回字典中。我的所有正则表达式都有不同的名称（比如没有两个“单词”的实例……有“单词”、“单词2”等等），我这样做是因为我觉得这样会让这个过程更简单。不过，我可能在结构上完全搞错了，导致这个任务难以完成。

编辑：代码

import re

io = {}

with open("rules.txt") as brain:
     for line in brain:
        key, value = line.split('|')
        io[key] = value

string = str(raw_input('> ')).lower()+' word'

x = 1

while x == 1:
    for regex, output in io.items():
        match = re.match(regex, string)
        if match:
            print(output.format(**match.groupdict()))
            string = str(raw_input('> ')).lower()+' word'
    else:
        print ' Sorry?'
        string = str(raw_input('> ')).lower()+' word'

用户输入正则表达式文本处理字典数据结构输入输出匹配聊天机器人记忆机制规则引擎

2 个回答

好的，让我看看我是否理解你的意思：

你想要一个包含键值对的字典。这将是聊天机器人的“记忆”。
你想对用户输入应用一些规则，这些规则是基于正则表达式的。但是，哪些规则适用取决于记忆字典中已经存在的键：如果“名字”还没有定义，那么定义“名字”的规则就适用；但如果已经定义了，那么提到“单词”的规则就适用。

我觉得你需要在规则中附加更多信息。例如，你上面提到的“单词”规则其实不应该把“单词”添加到字典里，否则它只会适用一次（想象一下如果用户多次尝试说“我的名字是x”会怎样）。

这样说能让你对如何继续有更多的想法吗？

哦，顺便说一下，我觉得“|”作为分隔符不是个好选择，因为它可能会出现在正则表达式中。我不太确定该建议什么：那“||”怎么样？

回答于 2025-04-17 由 Python大师

分享举报

我对你的算法原理有点困惑，因为我不太习惯使用命名组。
下面的代码是我解决你问题的方法，希望能给你一些启发。

我觉得只用一个字典并不是个好主意，这样会增加思考和算法的复杂性。所以我把代码基于两个字典：direg 和 memory。

这两个字典的键是组的索引，不是所有的索引，而是一些特定的索引，这些索引是每个单独模式中的最后一个。
因为为了好玩，我决定正则表达式可以有多个组。

我在代码中所说的单独模式是以下这些字符串：

"[mM]y name [Ii][sS] (\w*)"

"[Ii]n repertory (\w*) I [wW][aA][nN][tT] file (\w*)"

"[Ii] [wW][aA][nN][tT] to ([ \w]*)"

你会看到第二个单独模式有两个捕获组：因此总共有三个单独模式，但所有单独组加起来一共有四个组。

所以创建字典时需要特别注意，因为最后一个匹配组的索引（我通过正则表达式 MatchObject 的 lastindex 属性来使用）可能和正则表达式模式中单独正则的编号不一致：这比解释要容易理解。这就是为什么我在函数 distr() 中计算字符串 {0} {1} {2} {3} {4} 等等 的出现次数，而这些数字必须和对应单独模式中定义的组的数量相同。

我觉得 Laurence D'Oliveiro 提出的用 '||' 代替 '|' 作为分隔符的建议很有意思。

我的代码模拟了一个可以进行多次输入的会话：

import re

regi = ("[mM]y name [Ii][sS] (\w*)"
        "||Hi {0}!"
        "||You said that your name was {0} !!!",

        "[Ii]n repertory (\w*) I [wW][aA][nN][tT] file (\w*)"
        "||OK here's your file {0}\\{1} :"
        "||I already gave you the file {0}\\{1} !",

        "[Ii] [wW][aA][nN][tT] to ([ \w]*)"
        "||OK, I will do {0}"
        "||You already did {0}. Do yo really want again ?")


direg  = {}
memory = {}
def distr(regi,cnt = 0,di = direg,mem = memory,
          regnb = re.compile('{\d+}')):
    for i,el in enumerate(regi,start=1):
        sp = el.split('||')
        cnt += len(regnb.findall(sp[1]))
        di[cnt] = sp[1]
        mem[cnt] = sp[2]
        yield sp[0]

regx = re.compile('|'.join(distr(regi)))
print 'direg :\n',direg
print
print 'memory :\n',memory
for inp in ('I say that my name is Armano the 1st',
            'In repertory ONE I want file SPACE',
            'I want to record music',
            'In repertory ONE I want file SPACE',
            'I say that my name is Armstrong',
            'But my name IS Armstrong now !!!',
            'In repertory TWO I want file EARTH',
            'Now my name is Helena'):

    print '\ninput  ==',inp

    mat = regx.search(inp)
    if direg[mat.lastindex]:
        print 'output ==',direg[mat.lastindex]\
              .format(*(d for d in mat.groups() if d))
        direg[mat.lastindex] = None
        memory[mat.lastindex] = memory[mat.lastindex]\
                                .format(*(d for d in mat.groups() if d))
    else:
        print 'output ==',memory[mat.lastindex]\
              .format(*(d for d in mat.groups() if d))
        if not memory[mat.lastindex].startswith('Sorry'):
            memory[mat.lastindex] = 'Sorry, ' \
                                    + memory[mat.lastindex][0].lower()\
                                    + memory[mat.lastindex][1:]

结果

direg :
{1: 'Hi {0}!', 3: "OK here's your file {0}\\{1} :", 4: 'OK, I will do {0}'}

memory :
{1: 'You said that your name was {0} !!!', 3: 'I already gave you the file {0}\\{1} !', 4: 'You already did {0}. Do yo really want again ?'}

input  == I say that my name is Armano the 1st
output == Hi Armano!

input  == In repertory ONE I want file SPACE
output == OK here's your file ONE\SPACE :

input  == I want to record music
output == OK, I will do record music

input  == In repertory ONE I want file SPACE
output == I already gave you the file ONE\SPACE !

input  == I say that my name is Armstrong
output == You said that your name was Armano !!!

input  == But my name IS Armstrong now !!!
output == Sorry, you said that your name was Armano !!!

input  == In repertory TWO I want file EARTH
output == Sorry, i already gave you the file ONE\SPACE !

input  == Now my name is Helena
output == Sorry, you said that your name was Armano !!!

回答于 2025-04-17 由 Python大师

分享举报

如何保存正则表达式用户输入值（Python）

2 个回答

撰写回答