大Pandas群体内的渐进式价值收集

#Simulate some data d = { "id": [1,1,1,1,1,2,2,2,2], "action_order": [1,2,3,4,5,1,2,3,4], "n_actions": [5,5,5,5,5,4,4,4,4], "seed": ['1','2','3','4','5','10','11','12','13'], "time_spent": [0.3,0.4,0.5,0.6,0.7,10.1,11.1,12.1,13.1] } data = pd.DataFrame(d)

data \ .groupby(["profile_id"])[["artist_seed", "tlh"]] \ .apply(lambda x: dict(zip(x["artist_seed"], x["tlh"]))) \ .tolist() data \ .groupby("profile_id")[["artist_seed", "tlh", "action_order"]] \ .apply(lambda x: dict(zip(list(x["artist_seed"]), list(x["tlh"]))))

2条回答

网友

1楼 · 编辑于 2024-04-20 13:18:19

这个怎么样。你知道吗

In [15]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: d[['seed', 'time_spent']].iloc[:x+1].to_dict()))
Out[15]:
id
1   0           {'seed': {0: '1'}, 'time_spent': {0: 0.3}}
    1    {'seed': {0: '1', 1: '2'}, 'time_spent': {0: 0...
    2    {'seed': {0: '1', 1: '2', 2: '3'}, 'time_spent...
    3    {'seed': {0: '1', 1: '2', 2: '3', 3: '4'}, 'ti...
    4    {'seed': {0: '1', 1: '2', 2: '3', 3: '4', 4: '...
2   0         {'seed': {5: '10'}, 'time_spent': {5: 10.1}}
    1    {'seed': {5: '10', 6: '11'}, 'time_spent': {5:...
    2    {'seed': {5: '10', 6: '11', 7: '12'}, 'time_sp...
    3    {'seed': {5: '10', 6: '11', 7: '12', 8: '13'},...
dtype: object

此外，您可以修改.to_dict（）方法的参数来更改输出dict样式，请参阅：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html

或者这就是你想要的：

In [18]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: dict(zip(d['seed'].iloc[:x+1], d['time_spent'].iloc[:x+1]))))
Out[18]:
id
1   0                                           {'1': 0.3}
    1                                 {'1': 0.3, '2': 0.4}
    2                       {'1': 0.3, '2': 0.4, '3': 0.5}
    3             {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
    4    {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
2   0                                         {'10': 10.1}
    1                             {'10': 10.1, '11': 11.1}
    2                 {'10': 10.1, '11': 11.1, '12': 12.1}
    3     {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
dtype: object

网友

2楼 · 编辑于 2024-04-20 13:18:19

您可以保持一个运行的dict，并在每个apply迭代中返回最新版本的副本，每个组：

def wrapper(g):
    cumdict = {}
    return g.apply(update_cumdict, args=(cumdict,), axis=1)

def update_cumdict(row, cd):
    cd[row.seed] = row.time_spent
    return cd.copy()

data["new_col"] = data.groupby("id").apply(wrapper).reset_index()[0]

data.new_col
0                                           {'1': 0.3}
1                                 {'1': 0.3, '2': 0.4}
2                       {'1': 0.3, '2': 0.4, '3': 0.5}
3             {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
4    {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
5                                         {'10': 10.1}
6                             {'10': 10.1, '11': 11.1}
7                 {'10': 10.1, '11': 11.1, '12': 12.1}
8     {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
Name: new_col, dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章