Pandas细胞的阵列操作

citations_dict = {} for index, row in data_ref.iterrows(): if len(row['reference_list']) > 0: for reference in row['reference_list']: if reference not in citations_dict: citations_dict[reference] = {} d = data_ref.loc[data_ref['id'] == reference] citations_dict[reference]['venue'] = d['venue'] citations_dict[reference]['reference'] = d['reference'] citations_dict[reference]['citation'] = 1 else: citations_dict[reference]['citation'] += 1

3条回答

网友

1楼 · 编辑于 2024-06-09 06:52:26

数据

df = pd.DataFrame({'id': [1,2,3], 'refers': [[1,2,3], [1,3], []]})
    id  refers     referred_count
0   1   [1, 2, 3]   1
1   2   [1, 3]      1
2   3   []          2

创建引用出现次数的字典：

refer_count = df.refers.apply(pd.Series).stack()\
                .reset_index(drop=True)\
                .astype(int)\
                .value_counts()\
                .to_dict()

用每个id中的refer\计数减去refer\计数：

df['referred_count'] = df.apply(lambda x: refer_count[x['id']] - x['refers'].count(x['id']), axis = 1)

输出：

    id  refers    referred_count
0   1   [1, 2, 3]  1
1   2   [1, 3]     1
2   3   []         2

网友

2楼 · 编辑于 2024-06-09 06:52:26

步骤1：获取reference列中每个ID的计数并将其存储在字典中，并在创建新列时应用该函数。你知道吗

import pandas as pd
from collections import Counter

df = pd.DataFrame({'id':[1,2,3],'refers':[[2,3],[1,3],[]]})
counter = dict(Counter([item for sublist in df['refers'] for item in sublist]))
df['refer_counts'] = df['id'].apply(lambda x: counter[x])

输出

   id  refers  refer_counts
0   1  [2, 3]             1
1   2  [1, 3]             1
2   3      []             2

我想这正是你需要的！你知道吗

网友

3楼 · 编辑于 2024-06-09 06:52:26

首先使用^{}和^{}创建一个助手Series。你知道吗

这将是以id作为索引的列“refered\u count”的值。你知道吗

然后您可以将df的reset_index转换为id以便于本系列的合并，最后reset_index以使数据帧恢复到原始形状。你知道吗

s = pd.Series(np.hstack(df['refers'])).value_counts()
df.set_index('id').assign(referred_count=s).reset_index()

[输出]

   id  refers  referred_count
0   1  [2, 3]               1
1   2  [1, 3]               1
2   3      []               2

相关问题更多 >

编程相关推荐

热门问题

热门文章