如何用正则表达式匹配Python函数定义(仅此而已)?

3 投票
2 回答
1984 浏览
提问于 2025-04-17 17:34

我正在尝试在Python中使用正则表达式(RegEx)来提取一个函数的定义,而不想要其他任何东西。不过我遇到了很多问题。请问正则表达式适合用来做这个吗?

比如说,

def foo():
  print bar
-- Matches --

a = 2
def foo():
  print bar
-- Doesn't match as there's code above the def --

def foo():
  print bar
a = 2
-- Doesn't match as there's code below the def --

我想要解析的字符串示例是 "def isPalindrome(x):\n return x == x[::-1]"。但实际上,这段代码可能在函数定义的上面或下面还有其他行。

我需要使用什么样的正则表达式才能做到这一点呢?

2 个回答

2
reg = re.compile('((^ *)def \w+\(.*?\): *\r?\n'
                 '(?: *\r?\n)*'
                 '\\2( +)[^ ].*\r?\n'
                 '(?: *\r?\n)*'
                 '(\\2\\3.*\r?\n(?: *\r?\n)*)*)',
                 re.MULTILINE)

编辑

import re
script = '''
def foo():
  print bar

a = 2
def foot():
  print bar

b = 10
"""
opopo =457
def foor(x):


  print bar
  print x + 10
  def g(u):
    print

  def h(rt,o):
    assert(rt==12)
a = 2
class AZERT(object):
   pass
"""


b = 10
def tabulae(x):


\tprint bar
\tprint x + 10
\tdef g(u):
\t\tprint

\tdef h(rt,o):
\t\tassert(rt==12)
a = 2


class Z:
    def inzide(x):


      print baracuda
      print x + 10
      def gululu(u):
        print

      def hortense(rt,o):
        assert(rt==12)



def oneline(x): return 2*x


def scroutchibi(h%,n():245sqfg srot b#

'''

.

reg = re.compile('((?:^[ \t]*)def \w+\(.*\): *(?=.*?[^ \t\n]).*\r?\n)'
                 '|'
                 '((^[ \t]*)def \w+\(.*\): *\r?\n'
                 '(?:[ \t]*\r?\n)*'
                 '\\3([ \t]+)[^ \t].*\r?\n'
                 '(?:[ \t]*\r?\n)*'
                 '(\\3\\4.*\r?\n(?: *\r?\n)*)*)',
                 re.MULTILINE)

regcom = re.compile('("""|\'\'\')(.+?)\\1',re.DOTALL)


avoided_spans = [ma.span(2) for ma in regcom.finditer(script)]

print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'
for ma in  reg.finditer(script):
    print ma.group(),
    print '--------------------'
    print repr(ma.group())
    print
    try:
        exec(ma.group().strip())
    except:
        print "   isn't a valid definition of a function"
    am,bm = ma.span()
    if any(a<=am<=bm<=b for a,b in avoided_spans):
        print '   is a commented definition function' 

    print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'

结果

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foo():
  print bar

--------------------
'def foo():\n  print bar\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foot():
  print bar

--------------------
'def foot():\n  print bar\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foor(x):


  print bar
  print x + 10
  def g(u):
    print

  def h(rt,o):
    assert(rt==12)
--------------------
'def foor(x):\n\n\n  print bar\n  print x + 10\n  def g(u):\n    print\n\n  def h(rt,o):\n    assert(rt==12)\n'

   is a commented definition function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def tabulae(x):


    print bar
    print x + 10
    def g(u):
        print

    def h(rt,o):
        assert(rt==12)
--------------------
'def tabulae(x):\n\n\n\tprint bar\n\tprint x + 10\n\tdef g(u):\n\t\tprint\n\n\tdef h(rt,o):\n\t\tassert(rt==12)\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    def inzide(x):


      print baracuda
      print x + 10
      def gululu(u):
        print

      def hortense(rt,o):
        assert(rt==12)



--------------------
'    def inzide(x):\n\n\n      print baracuda\n      print x + 10\n      def gululu(u):\n        print\n\n      def hortense(rt,o):\n        assert(rt==12)\n\n\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def oneline(x): return 2*x
--------------------
'def oneline(x): return 2*x\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def scroutchibi(h%,n():245sqfg srot b#
--------------------
'def scroutchibi(h%,n():245sqfg srot b#\n'

   isn't a valid definition of a function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
9

不,正则表达式并不是解决这个问题的合适工具。这就像有人拼命想用正则表达式来解析HTML一样。这些语言并不是简单的规则,所以你无法处理所有可能遇到的奇怪情况。

建议使用内置的 解析器模块,构建一个解析树,检查定义节点并使用它们。使用 ast 模块 会更好,因为它使用起来方便得多。下面是一个例子:

import ast

mdef = 'def foo(x): return 2*x'
a = ast.parse(mdef)
definitions = [n for n in ast.walk(a) if type(n) == ast.FunctionDef]

撰写回答