Pandas：保存一个类别唯一的行

largedf = pd.DataFrame({'arow': ['row1', 'row2', 'row3', 'row4'], 'green': ['a', 'b', 'b', 'a'], 'red': ['a', 'b', 'b', 'a'], 'cat': ['b', 'a', 'b', 'a'], 'dog': ['b', 'a', 'b', 'a']}) arow cat dog green red 0 row1 b b b a 1 row2 a a b b 2 row3 b b b b 3 row4 a a a a

shorterdf = pd.DataFrame({'arow': ['row1', 'row2'], 'green': ['a', 'b'], 'red': ['a', 'b'], 'cat': ['b', 'a'], 'dog': ['b', 'a']}) arow cat dog green red category percent 0 row1 b b b a colors 0.5 1 row2 a a b b animals 1

2条回答

网友

1楼 · 编辑于 2024-06-16 10:52:16

我们用nunique过滤掉我们需要的行

t=largedf[largedf.iloc[:,1:].nunique(1).gt(1)]

t=t.set_index('arow')
s=t.copy()

然后我们使用map将列更改为类别

s.columns=s.columns.map(dict(zip(s.columns,np.repeat(['animals','color'],2))).get)

# get the percentage and the category accordingly 
s1=(s.eq('a').groupby(level=0,axis=1).sum()/2).stack()
# concat together 
pd.concat([t,s1[s1!=0].reset_index(level=1)],axis=1).rename(columns={'level_1':'category',0:'percent'})
Out[287]: 
     cat dog green red category  percent
arow                                    
row1   b   b     a   a    color      1.0
row2   a   a     b   b  animals      1.0

网友

2楼 · 编辑于 2024-06-16 10:52:16

创建一个方便的字典来重命名现有数据帧的列

m = {k: (v, k) for k, v in {
        **dict.fromkeys(colors, 'colors'),
        **dict.fromkeys(animals, 'animals')
    }.items()}

largedf[
    largedf.drop('arow', 1)
           .rename(columns=m.get)
           .eq('a').any(axis=1, level=0).sum(1).eq(1)
]

   arow cat dog green red
0  row1   b   b     a   a
1  row2   a   a     b   b

详细信息

df = largedf.drop('arow', 1).rename(columns=m.get)
df

  animals     colors    
      cat dog  green red
0       b   b      a   a
1       a   a      b   b
2       b   b      b   b
3       a   a      a   a

df.eq('a')

  animals        colors       
      cat    dog  green    red
0   False  False   True   True
1    True   True  False  False
2   False  False  False  False
3    True   True   True   True

df.eq('a').any(axis=1, level=0)

   animals  colors
0    False    True
1     True   False
2    False   False
3     True    True

df.eq('a').any(axis=1, level=0).sum(1).eq(1)

0     True
1     True
2    False
3    False
dtype: bool

详细信息

相关问题更多 >

编程相关推荐

热门问题

热门文章