我有一个数据帧,例如:
COL1 COL2
1 pupa male
2 pupa female
3 pupae female
4 larva female
5 larvae female & male
6 pupe female
10 adult female
12 NA female
7 pupa male
8 pupae male
9 adult male
11 pupae NA
13 NA male
因此,我们的想法是首先按COL2
排序,其中我应该首先对包含female
的任何值进行排序:
str.contains("female") > !str.contains("female")
COL2 > COL1
所以
然后按COL1
排序,首先放置包含pup
的值,然后larv
然后others
str.contains('pup') > str.contains("larv") > other
以下是预期值:
COL1 COL2
2 pupa female
3 pupae female
6 pupe female
4 larva female
5 larvae female & male
10 adult female
12 NA female
1 pupa male
7 pupa male
8 pupae male
9 adult male
11 pupae NA
13 NA male
到目前为止,我仅使用以下方法成功地按COL1排序:
df['Sex'] = pd.Categorical(df['Sex'], ['female','pooled male and female', 'male and female','male'])
df=new_df.sort_values("Sex")
但是正如您在这里看到的,解决方案需要一个列表,而不是一个.str.contains
解决方案(更具全局性)
我发现最好的办法就是把这些标签转换成数字&;然后分类
例如
female & starts with female = 1
、male = 0
&others = -1
,col1也是这样这样分类就容易了。下面我有东西给你-
注意-我认为
NA
是字符串元素。但是您可以检查None
类型相关问题 更多 >
编程相关推荐