Pandas分食合一，为什么要结合帕金森病（[df]）在排序时起作用？

2024-04-24 05:51:59 发布

男 | 程序猿一只，喜欢编程写python代码。

我用pandas做了一个split-apply-merge类型的工作流。“apply”部分返回一个DataFrame。当我运行gropupby的数据帧第一次排序时，只需从apply返回一个DataFrame引发{}。相反，我发现当我返回pd.concat([df])（而不仅仅是return df）时，它可以正常工作。如果不对DataFrame排序，两种合并结果的方法都能正常工作。我想排序肯定对索引有影响，但我不明白是什么。有人能解释一下吗？在

import pandas as pd
import numpy as np


def fill_out_ids(df, filling_function, sort=False, sort_col='sort_col',
                 group_by='group_col', to_fill=['id1', 'id2']):

    df = df.copy()
    df.set_index(group_by, inplace=True)
    if sort:
        df.sort_values(by=sort_col, inplace=True)
    g = df.groupby(df.index, sort=False, group_keys=False)
    df = g.apply(filling_function, to_fill)
    df.reset_index(inplace=True)
    return df


def _fill_ids_concat(df, to_fill):
    df[to_fill] = df[to_fill].fillna(method='ffill')
    df[to_fill] = df[to_fill].fillna(method='bfill')
    return pd.concat([df])


def _fill_ids_plain(df, to_fill):
    df[to_fill] = df[to_fill].fillna(method='ffill')
    df[to_fill] = df[to_fill].fillna(method='bfill')
    return df


def test_fill_out_ids():
    input_df = pd.DataFrame(
        [
            ['a',       None,       1.0,    1],
            ['a',       None,       1.0,    3],
            ['a',       'name1',    np.nan, 2],

            ['b',       None,       2.0,    3],
            ['b',       'name1',    np.nan, 2],
            ['b',       'name2',    np.nan, 1],
        ],
        columns=['group_col', 'id1', 'id2', 'sort_col']
    )

    # this works
    fill_out_ids(input_df, _fill_ids_plain, sort=False)

    # this raises: ValueError: cannot reindex from a duplicate axis
    fill_out_ids(input_df, _fill_ids_plain, sort=True)

    # this works
    fill_out_ids(input_df, _fill_ids_concat, sort=True)

    # this works
    fill_out_ids(input_df, _fill_ids_concat, sort=False)


if __name__ == "__main__":
    test_fill_out_ids()

Tags： to false true ids dataframe df input group

0条回答

目前没有回答

Pandas分食合一，为什么要结合帕金森病（[df]）在排序时起作用？

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas分食合一，为什么要结合帕金森病（[df]）在排序时起作用？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >