通过Pandas计算每一行中的共识差异

2024-04-25 18:51:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

import pandas as pd
df = pd.DataFrame({'A':['a','b','c','d'],'B':['a','b','c','x'],'C':['y','b','c','d']})
df

   A  B  C
0  a  a  y
1  b  b  b
2  c  c  c
3  d  x  d

我想找出每一行中最常见的字符,以及与共识的差异总数:

       A  B  C Consensus
    0  a  a  y         a
    1  b  b  b         b
    2  c  c  c         c
    3  d  x  d         d
Total  0  1  1         0

运行循环是一种方法,但它似乎效率低下:

consensus = []
for idx in df.index:
    consensus.append(df.loc[idx].value_counts().index[0])
df['Consensus'] = consensus

(以此类推)

有没有一种直接的方法来取得共识并计算分歧?你知道吗


Tags: 数据方法importdataframepandasdfindexas
1条回答
网友
1楼 · 发布于 2024-04-25 18:51:14

您可以使用mode来获得一致性值:

>>> df.mode(axis=1)
   0
0  a
1  b
2  c
3  d

注意文档中的注意事项:

Gets the mode(s) of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

Note that there could be multiple values returned for the selected axis (when more than one item share the maximum frequency), which is the reason why a dataframe is returned. If you want to impute missing values with the mode in a dataframe df, you can just do this: df.fillna(df.mode().iloc[0])

要计算每一列的一致性差异,您可以与ne进行比较,然后求和:

>>> df['consensus'] = df.mode(axis=1)
>>> df.loc[:, 'A':'C'].ne(df['consensus'], axis=0).sum(axis=0)
A    0
B    1
C    1
dtype: int64

相关问题 更多 >