对panda datafram中的特定列求和

import pandas as pd df = pd.DataFrame({'a': [1,'produces','produces','understands','produces'], 'b' : [2,'','produces','understands','understands'], 'c' : [3,'','','understands','']}) transposed_df = df.transpose() transposed_df

0 1 2 3 4 a 1 produces produces understands produces b 2 produces understands understands c 3 understands

measure1 = transposed_df.iloc[:,[0,1,2]].replace('produces',1) measure2 = transposed_df.iloc[:,[0,3]].replace('understands',1) measure3 = transposed_df.iloc[:,[0,4]].replace('produces',1) measures = [measure1, measure2, measure3] from functools import reduce counter = reduce (lambda left, right: pd.merge(left,right), measures) counter

0 1 2 3 4 first a 1 produces produces understands produces NaN b 2 produces understands understands NaN c 3 understands NaN

1条回答

网友

1楼 · 发布于 2024-04-25 20:13:08

有两个问题：求和和和插入不同索引的列。你知道吗

1）总和

您的df属于objects类型（所有字符串，包括空字符串）。数据帧counter也是混合类型（int和string）：

counter.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 5 columns):
0    3 non-null int64
1    3 non-null object
2    3 non-null object
3    3 non-null int64
4    3 non-null object
dtypes: int64(2), object(3)

请记住：

Columns with mixed types are stored with the object dtype. see dtypes

因此，尽管counters的第一行包含两个整数，但它们属于object类型的系列（列），pandas不喜欢对它们进行汇总（显然您使用的是低于0.22.0的pandas版本，在以后的版本中，结果是0.0，默认值为min_count=0，请参见sum）。你可以看到这个

counter.iloc[:,[1,2]].applymap(type)

               1              2
0  <class 'int'>  <class 'int'>
1  <class 'str'>  <class 'int'>
2  <class 'str'>  <class 'str'>

因此，解决方案是尽可能将对象显式地转换为数字（即整行由整数组成，而不是空字符串和整数）：

counter.iloc[:,[1,2]].apply(lambda x: sum(pd.to_numeric(x)), axis=1)

结果：

0    2.0
1    NaN
2    NaN

2）列插入

有不同的索引：

counter.index
# Int64Index([0, 1, 2], dtype='int64')
transposed_df.index
# Index(['a', 'b', 'c'], dtype='object')

所以你用你的方法得到所有的南。最简单的方法是只插入序列的值，而不是序列本身（熊猫对齐索引：

transposed_df['first'] = counter.iloc[:,[1,2]].apply(lambda x: sum(pd.to_numeric(x)), axis=1).to_list()

结果：

   0         1         2            3            4  first
a  1  produces  produces  understands     produces    2.0
b  2            produces  understands  understands    NaN
c  3                      understands                 NaN

1）总和

2）列插入

相关问题更多 >

编程相关推荐

热门问题

热门文章