在Python中解析用户输入

0 投票

2 回答

1688 浏览

提问于 2025-04-18 00:31

我正在尝试处理用户输入的内容，每个单词、名字或数字之间用空格分开（字符串用双引号包起来的除外），然后把这些内容放到一个列表里。这个列表会在处理的过程中打印出来。我之前写过这个代码的一个版本，但这次我想用“标记”（Tokens）来让代码看起来更整洁。下面是我目前写的代码，但它没有打印出任何东西。

    #!/util/bin/python
import re


def main ():


    for i in tokenizer('abcd xvc  23432 "exampe" 366'):
        print (i);



    tokens = (
  ('STRING', re.compile('"[^"]+"')),  # longest match
  ('NAME', re.compile('[a-zA-Z_]+')),
  ('SPACE', re.compile('\s+')),
  ('NUMBER', re.compile('\d+')),
)


def tokenizer(s):
  i = 0
  lexeme = []
  while i < len(s):
    match = False
    for token, regex in tokens:
      result = regex.match(s, i)
      if result:
        lexeme.append((token, result.group(0)))
        i = result.end()
        match = True
        break
    if not match:
      raise Exception('lexical error at {0}'.format(i))
  return lexeme




  main()

列表操作字符串处理数据处理标记化用户输入解析

2 个回答

我觉得你的缩进有问题，像这样：

#!/util/bin/python
import re

tokens = (
  ('STRING', re.compile('"[^"]+"')),  # longest match
  ('NAME', re.compile('[a-zA-Z_]+')),
  ('SPACE', re.compile('\s+')),
  ('NUMBER', re.compile('\d+')),
)


def main ():

  for i in tokenizer('abcd xvc  23432 "exampe" 366'):
    print (i);


def tokenizer(s):
  i = 0
  lexeme = []
  while i < len(s):
    match = False
    for token, regex in tokens:
      result = regex.match(s, i)
      if result:
        lexeme.append((token, result.group(0)))
        i = result.end()
        match = True
        break
    if not match:
      raise Exception('lexical error at {0}'.format(i))
  return lexeme


main()

会输出：

('NAME', 'abcd')
('SPACE', ' ')
('NAME', 'xvc')
('SPACE', '  ')
('NUMBER', '23432')
('SPACE', ' ')
('STRING', '"exampe"')
('SPACE', ' ')
('NUMBER', '366')

回答于 2025-04-18 由 Python大师

分享举报

我建议使用 shlex 模块来分割带引号的字符串：

>>> import shlex
>>> s = 'hello "quoted string" 123   \'More quoted string\' end'
>>> s
'hello "quoted string" 123   \'More quoted string\' end'
>>> shlex.split(s)
['hello', 'quoted string', '123', 'More quoted string', 'end']

之后，你可以根据自己的需要对所有的标记（比如字符串、数字等）进行分类。唯一需要注意的是，shlex 不会处理空格。

这里有一个简单的示例：

import shlex

if __name__ == '__main__':
    line = 'abcd xvc  23432 "exampe" 366'
    tokens = shlex.split(line)
    for token in tokens:
        print '>{}<'.format(token)

输出结果：

>abcd<
>xvc<
>23432<
>exampe<
>366<

更新

如果你坚持不去掉引号，那么可以在调用 split() 时设置 posix=False：

    tokens = shlex.split(line, posix=False)

输出结果：

>abcd<
>xvc<
>23432<
>"exampe"<
>366<

回答于 2025-04-18 由 Python大师

分享举报

在Python中解析用户输入

2 个回答

更新

撰写回答