按2列进行分级排序[Pandas]

2024-06-02 08:59:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,例如:

COL1 COL2
1 pupa male
2 pupa female
3 pupae female
4 larva female
5 larvae female & male
6 pupe female
10 adult female 
12 NA female
7 pupa male
8 pupae male
9 adult male
11 pupae NA
13 NA male 

因此,我们的想法是首先按COL2排序,其中我应该首先对包含female的任何值进行排序:

str.contains("female") > !str.contains("female")

COL2 > COL1所以

然后按COL1排序,首先放置包含pup的值,然后larv然后others

str.contains('pup') > str.contains("larv") > other 

以下是预期值:

COL1 COL2
2 pupa female
3 pupae female
6 pupe female
4 larva female
5 larvae female & male
10 adult female 
12 NA female
1 pupa male
7 pupa male
8 pupae male
9 adult male
11 pupae NA
13 NA male 

到目前为止,我仅使用以下方法成功地按COL1排序:

df['Sex'] = pd.Categorical(df['Sex'], ['female','pooled male and female', 'male and female','male'])

df=new_df.sort_values("Sex")

但是正如您在这里看到的,解决方案需要一个列表,而不是一个.str.contains解决方案(更具全局性)


Tags: df排序malefemalecol2col1nacontains
1条回答
网友
1楼 · 发布于 2024-06-02 08:59:39

我发现最好的办法就是把这些标签转换成数字&;然后分类

例如female & starts with female = 1male = 0&others = -1,col1也是这样

这样分类就容易了。下面我有东西给你-

data = [['pupa', 'male'],
['pupa', 'female'],
['pupae', 'female'],
['larva', 'female'],
['larvae', 'female & male'],
['pupe', 'female'],
['pupa', 'male'],
['pupae', 'male'],
['adult', 'male'],
['adult', 'female'], 
['pupae', 'NA'],
['NA', 'female'],
['NA', 'male']] 

import pandas as pd

data_df = pd.DataFrame(data, columns=['col1', 'col2'])
data_df
    col1    col2
0   pupa    male
1   pupa    female
2   pupae   female
3   larva   female
4   larvae  female & male
5   pupe    female
6   pupa    male
7   pupae   male
8   adult   male
9   adult   female
10  pupae   NA
11  NA      female
12  NA      male

data_df['col2_cond'] = data_df['col2'].apply(
    lambda x:1 if x.startswith('female') else (-1 if x == 'NA' else 0)
)
data_df['col1_cond'] = data_df['col1'].apply(
    lambda x:2 if x.find('pup') != -1 else (1 if x.find('larv') != -1 else (-1 if x == 'NA' else 0))
)

data_df.sort_values(['col2_cond', 'col1_cond'], ascending=[False, False])
    col1    col2      col2_cond col1_cond
1   pupa    female          1   2
2   pupae   female          1   2
5   pupe    female          1   2
3   larva   female          1   1
4   larvae  female & male   1   1
9   adult   female          1   0
11  NA      female          1   -1
0   pupa    male            0   2
6   pupa    male            0   2
7   pupae   male            0   2
8   adult   male            0   0
12  NA      male            0   -1
10  pupae   NA             -1   2

注意-我认为NA是字符串元素。但是您可以检查None类型

相关问题 更多 >