奇怪的行为正则表达式

import re def tokenize(code): tokens = [] tokens_re = { 'comentarios' : '(//.*)', # comentary 'linhas' : '(\n)', # lines 'instrucoes' : '(add)', # instructions 'numeros_hex' : '([-+]?0x[0-9a-fA-F]+)', # hex numbers 'numeros_bin' : '([-+]?0b[0-1]+)', # binary numbers 'numeros_dec' : '([-+]?[0-9]+)'} # decimal numbers #'reg32' : 'eax|ebx|ecx|edx|esp|ebp|eip|esi', #'reg16' : 'ax|bx|cx|dx|sp|bp|ip|si', #'reg8' : 'ah|al|bh|bl|ch|cl|dh|dl'} pattern = re.compile('|'.join(list(tokens_re.values()))) scan = pattern.scanner(code) while 1: m = scan.search() if not m: break tipo = list(tokens_re.keys())[m.lastindex-1] # type valor = repr(m.group(m.lastindex)) # value if tipo == 'linhas': print('') else: print(tipo, valor) return tokens code = ''' add eax, 5 //haha add ebx, -5 add eax, 1234 add ebx, 1234 add ax, 0b101 add bx, -0b101 add al, -0x5 add ah, 0x5 ''' print(tokenize(code))

instrucoes 'add' numeros_dec '5' comentarios '//haha' instrucoes 'add' numeros_dec '-5' instrucoes 'add' numeros_dec '1234' instrucoes 'add' numeros_dec '1234' instrucoes 'add' numeros_bin '0b101' instrucoes 'add' numeros_bin '-0b101' instrucoes 'add' numeros_hex '-0x5' instrucoes 'add' numeros_hex '0x5'

instrucoes 'add' numeros_dec '5' comentarios '//haha' instrucoes 'add' numeros_dec '-5' instrucoes 'add' numeros_dec '1234' instrucoes 'add' numeros_dec '1234' instrucoes 'add' numeros_dec '0' numeros_dec '101' instrucoes 'add' numeros_dec '-0' numeros_dec '101' instrucoes 'add' numeros_dec '-0' numeros_dec '5' instrucoes 'add' numeros_dec '0' numeros_dec '5'

1条回答

网友

1楼 · 发布于 2024-06-02 05:56:15

你从字典里建立你的正则表达式。字典是没有顺序的，所以正则表达式模式可能会时而不同，从而产生不同的结果

如果您想要“稳定”的结果，我建议您要么使用sorted(tokens_re.values())，要么在列表/元组而不是字典中指定它们

例如，您可以将它们指定为成对列表，然后使用该列表来构建模式和字典：

tokens_re = [
    ('comentarios', '(//.*)'),                         # comentary
    ('linhas',      '(\n)'),                           # lines
    ('instrucoes',  '(add)'),                          # instructions
    ('numeros_hex', '([-+]?0x[0-9a-fA-F]+)'),          # hex numbers
    ('numeros_bin', '([-+]?0b[0-1]+)'),                # binary numbers
    ('numeros_dec', '([-+]?[0-9]+)'),                  # decimal numbers
]
pattern = re.compile('|'.join(p for _, p in tokens_re))
tokens_re = dict(tokens_re)

相关问题更多 >

编程相关推荐

热门问题

热门文章