从另一列分组的列中的列表中查找频繁元素

网友

1楼 · 编辑于 2024-04-25 07:13:14

我希望我正确理解您的问题-您希望前三项是A的邻居：

from collections import Counter

def fn(x):
    c = Counter()
    for row in x:
        s = pd.Series(row)
        m = s == 'A'
        c.update(s[m.shift(fill_value=False) | m.shift(-1, fill_value=False)])
    return c.most_common(3)

print( df.groupby('col1').col2.apply(fn) )

印刷品：

col1
type1    [(C, 2), (F, 1)]
type2            [(E, 2)]
Name: col2, dtype: object

C是A的2倍邻居，F在type1中只有一次

E是A的2倍邻居，在type2

如果您想要最常见的，您可以在fn()中执行以下操作：

return list(dict(c.most_common(1)).keys())

这张照片是：

col1
type1    [C]
type2    [E]
Name: col2, dtype: object

网友

2楼 · 编辑于 2024-04-25 07:13:14

def func(_list):
    a = _list
    b = [a.count(i) for i in a ]
    c = pd.DataFrame({'Letter':a,
                      'Count':b})
    d = c[c['Count'] == c['Count'].max()]
    e = d['Letter'].unique()
    f = np.array(e,dtype = object)
    return f

df = pd.DataFrame({'col1':['type1','type1','type1','type2','type2'],
               'col2':[['A','C','B','D'],['C','A','F','E'],['F','E','G','H'],['A','E','F','G'],['A','E','J','K']]
              })

df = df.groupby('col1').sum()

df['col3'] = df['col2'].apply(lambda x: func(x))

df

网友

3楼 · 编辑于 2024-04-25 07:13:14

from collections import Counter

def most_freq(series, input_):
    cnt = Counter()
    for row in series:
        if input_ in row:
            for i in row:
                cnt[i] += 1
    return [k for (k,v) in cnt.most_common(2)]

query = 'A'
df.groupby('col1').agg({'col2': lambda x: most_freq(x, query)})

产出：

        col2
col1    
type1   [A, C]
type2   [A, E]

说明：

解决此问题的一种可能方法是使用自定义的^{}函数

如果出现user input，它使用Counter收集按col1分组的每行中的所有元素计数，并返回其前2个出现次数。如果要查找前3个匹配项，OP可以将cnt.most_common(2)中的arg2更改为3

相关问题更多 >

编程相关推荐

热门问题

热门文章

从另一列分组的列中的列表中查找频繁元素

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >