从项目中解析类和函数依赖关系

Question

我正在尝试分析一个Python代码库中类和函数之间的依赖关系。我的第一步是使用Python的csv模块和正则表达式创建一个可以导入到Excel的.csv文件。

我目前的代码大致是这样的：

import re
import os
import csv 
from os.path import join


class ClassParser(object):
   class_expr = re.compile(r'class (.+?)(?:\((.+?)\))?:')                                                                                                                                                                                    
   python_file_expr = re.compile(r'^\w+[.]py$')

   def findAllClasses(self, python_file):
      """ Read in a python file and return all the class names
      """
      with open(python_file) as infile:
         everything = infile.read()
         class_names = ClassParser.class_expr.findall(everything)
         return class_names

   def findAllPythonFiles(self, directory):
      """ Find all the python files starting from a top level directory
      """
      python_files = []
      for root, dirs, files in os.walk(directory):
         for file in files:
            if ClassParser.python_file_expr.match(file):
               python_files.append(join(root,file))
      return python_files

   def parse(self, directory, output_directory="classes.csv"):
      """ Parse the directory and spit out a csv file
      """
      with open(output_directory,'w') as csv_file:
         writer = csv.writer(csv_file)
         python_files = self.findAllPythonFiles(directory)
         for file in python_files:
            classes = self.findAllClasses(file)
            for classname in classes:
               writer.writerow([classname[0], classname[1], file])

if __name__=="__main__":
   parser = ClassParser()
   parser.parse("/path/to/my/project/main/directory")

这段代码会生成一个.csv格式的输出：

class name, inherited classes (also comma separated), file
class name, inherited classes (also comma separated), file
... etc. ...

现在我想开始解析函数的声明和定义，除了类名之外。我想问的是：有没有更好的方法来获取类名、继承的类名、函数名、参数名等等？

注意：我考虑过使用Python的ast模块，但我对它没有经验，不知道怎么用它来获取我想要的信息，或者它是否能做到这一点。

编辑：回应Martin Thurau的请求，提供更多信息 - 我之所以想解决这个问题，是因为我接手了一个相当庞大的项目（超过10万行），这个项目的模块、类和函数没有明显的结构；它们都作为一堆文件存在于一个源代码目录中。

一些源文件包含几十个相关性不大的类，而且每个文件超过1万行，这让维护变得很困难。我开始进行分析，评估将每个类打包成一个更有条理的结构的相对难度，我以《打包指南》为基础。对于这个分析，我关心的一部分是一个类与它所在文件中的其他类的关联程度，以及一个特定类依赖于哪些导入或继承的类。

代码分析模块化设计继承关系数据导出结构化编程函数解析类依赖关系源代码维护

从项目中解析类和函数依赖关系

1 个回答

撰写回答