使用pandas链接数据转换方法的设计模式

t1 generates columns C8, C9, C12, C22 using C1, C2, C3, C4 t2 generates columns C10, C11, C17 using C3, C6, C7, C8 t3 generates columns C13, C14, C15, C16 using C5, C8, C10, C11, C22 t4 generates columns C18, C19, C20, C21, C23, C24, C25 using C13, C15 t5 generates columns C26, C27, C28, C29, C30 using C5, C19, C20, C21

def ti(df): output_cols = get_output_cols() if output_cols_already_exist(df, output_cols): return df, "{} skipped, the output cols {} already exist".format(inspect.stack()[0][3], output_cols) else: input_cols = get_required_input_cols() missing_cols = get_missing_cols(df, input_cols): if missing_cols == []: // do stuff log = "Performed {} transformation. Created {} columns".format(inspect.stack()[0][3], input_cols) else: for col in input_cols: df[col] = np.NaN log = "Cannot perform {} transformation because {} columns are missing. {} are filled with NaN values".format(inspect.stack()[0][3], missing_cols, output_cols)

text = "" df = pd.read_csv(input_path) df, log_text = t1(df) text = text + log_text + "\n" df, log_text = t2(df) text = text + log_text + "\n" df, log_text = t3(df) text = text + log_text + "\n" df, log_text = t4(df) text = text + log_text + "\n" df, log_text = t5(df) text = text + log_text + "\n" df.to_csv("output_data.csv", index = False) logging.info(text)

1条回答

网友

1楼 · 发布于 2024-05-23 23:51:54

因为在Python中，函数是first-class objects，所以您可以重构代码，通过提取t[i]函数的区别（即do stuff部分），将其作为辅助函数并将其视为参数，从而泛化t[i]函数

在调用函数（t1、t2等，或下文中的重构助手版本）时，您还可以通过在列表上迭代来避免重复

最后，使用f-strings有助于提高代码的可读性

大概是这样的：

# t function takes a dataframe and a function as parameters
def t(df, do_stuff_func):
    output_cols = get_output_cols()
    if output_cols_already_exist(df, output_cols):
        return (
            df,
            (
                f"{inspect.stack()[0][3]} skipped, "
                f"the output cols {output_cols} already exist",
            ),
        )
    else:
        input_cols = get_required_input_cols()
        missing_cols = get_missing_cols(df, input_cols)
        if missing_cols == []:
            # Call the helper function
            do_stuff_func()
            log = (
                f"Performed {inspect.stack()[0][3]} transformation."
                f"Created {input_cols} columns"
            )
        else:
            for col in input_cols:
                df[col] = np.NaN
            log = (
                f"Cannot perform {inspect.stack()[0][3]} transformation"
                f"because {missing_cols} columns are missing. "
                f"{output_cols} are filled with NaN values"
            )

# Define the five new 'do_stuff' functions
def do_stuff1():
    pass
...
def do_stuff5():
    pass

# Store the functions
do_stuff_funcs = [do_stuff1, do_stuff2, do_stuff3, do_stuff4, do_stuff5]

# Call t function in combination with df and do_stuff_funcs helpers
for do_stuff_func in do_stuff_funcs:
    df, log_text = t(df, do_stuff_func)
    text = text + log_text + "\n"

# Save the results
df.to_csv("output_data.csv", index = False)
logging.info(text)

相关问题更多 >

编程相关推荐

热门问题

热门文章