功能略有不同的重复数据消除代码

2024-05-23 14:44:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个非常相似的循环,这两个包含一个非常类似于第三个循环的内部循环(呃。。。:) ). 用代码说明,它看起来很接近:

# First function
def fmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        for build in array[test_index]:  # <- All functions have this loop

            # Retrieved tests is calculated inside the build loop in kfold1
            retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Second function
def fmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Third function
def fmeasure_all(array):
    ret = []
    for build in array:  # <- All functions have this loop

        relevant = set(build['tests'])
        fval = calc_f2(relevant)  # <- Instead of calc_f, I call calc_f2
        if fval is not None:
            ret.append(fval)

    return ret

前两个函数只在方式和时间上不同,它们计算retrieved_tests。第三个函数与前两个函数的内部循环不同,它调用calc_f2,不使用retrieved_tests。你知道吗

实际上,代码更复杂,但是当重复的代码让我恼火的时候,我想我可以接受它。不过,最近我一直在对它进行修改,一次要在两到三个地方进行修改,真烦人。你知道吗

有没有合并重复代码的好方法?我能想到的唯一方法是引入类,它引入了很多样板文件,如果可能的话,我希望保持函数为纯函数。你知道吗


编辑

这是calc_fcalc_f2的内容:

def calc_f(relevant, retrieved):
    """Calculate the F-measure given relevant and retrieved tests."""
    recall = len(relevant & retrieved)/len(relevant)
    prec = len(relevant & retrieved)/len(retrieved)
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)


def calc_f2(relevant, nbr_tests=1000):
    """Calculate the F-measure given relevant tests."""
    recall = 1
    prec = len(relevant) / nbr_tests
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)

f_measure计算精度和召回率的harmonic mean。你知道吗

基本上,calc_f2采用了很多快捷方式,因为不需要检索测试。你知道吗


Tags: inbuildloopindexlentestscalcarray
3条回答

典型的解决方案是识别算法的各个部分,并使用Template method design pattern在子类中实现不同的阶段。我完全不理解您的代码,但我假设会有makeGlobalRetrievedTests()makeIndividualRetrievedTests()这样的方法?你知道吗

一种方法是将每个内部循环作为一个函数来编写,然后将外部循环作为一个单独的函数来接收其他循环作为参数。这与排序函数(接收用于比较两个元素的函数)中的操作非常接近。你知道吗

当然,困难的部分是找出所有函数之间究竟有什么共同点,这并不总是简单的。你知道吗

有一个公共函数,它需要一个额外的参数来控制计算retrieved_tests的位置,这也会起作用。你知道吗

例如

def fmeasure_kfold_generic(array, nfolds, mode):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        if mode==2:
            retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop
            # Retrieved tests is calculated inside the build loop in kfold1
            if mode==1:
                retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

相关问题 更多 >