我正在读取多个.csv文件作为一个熊猫数据帧具有相同的形状。对于某些索引,有些值是零,所以我想选择每个具有相同形状的索引的值,为同一索引放置零值,并删除零使其成为同一形状:
a = pd.DataFrame(pd.read_csv("path_a",index_col=0))
b = pd.DataFrame(pd.read_csv("path_b",index_col=0))
c = pd.DataFrame(pd.read_csv("path_c",index_col=0))
print a,"\n",b,"\n",c
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
for i in range (0,X):
xdata = a[i]
xdata1 = b[i]
xdata2 = c[i]
xdata = np.where(xdata2==0,0,xdata)
xdata1 = np.where(xdata2==0,0,xdata1)
xdata1 = np.where(xdata==0,0,xdata1)
xdata2 = np.where(xdata==0,0,xdata2)
xdata = np.where(xdata1==0,0,xdata)
xdata2 = np.where(xdata1==0,0,xdata2)
indexX = np.argwhere(xdata==0)
index1X = np.argwhere(xdata1==0)
index2X = np.argwhere(xdata2==0)
xdata = np.delete(xdata,indexX)
xdata1 = np.delete(xdata1,index1X)
xdata2 = np.delete(xdata2,index2X)
print d[i],"\n",xdata,"\n",xdata1,"\n",xdata2
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 0.0 0.0 3.8 0.0 0.0 9.8 0.0
AGO 2.0 0.0 3.0 4.0 0.0 0.0 0.0
ALB 0.0 0.2 0.5 0.2 1.3 1.6 2.7
AND 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ARE 0.7 0.8 0.9 1.7 2.3 2.7 3.0
ARG 3.1 6.7 5.3 15.1 17.2 18.2 18.7
ARM 0.4 0.5 0.5 0.5 0.4 1.2 1.3
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 2.5 0.0 0.0 4.7 0.0 0.0 0.0
AGO 13.1 14.9 15.8 16.4 16.9 17.6 18.1
ALB 1.4 1.5 1.6 1.6 1.6 1.6 1.7
AND 0.2 0.2 0.2 0.2 0.1 0.4 0.6
ARE 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ARG 1.8 1.8 1.7 1.8 1.8 1.9 1.9
ARM 1.8 1.8 1.7 0.0 1.8 1.9 1.5
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AGO 0.0 0.0 4.7 5.8 6.0 0.0 0.0
ALB 0.0 0.2 0.5 0.2 1.3 1.6 2.7
AND 1.4 1.8 2.3 3.7 0.0 0.0 5.4
ARE 0.7 0.8 0.9 1.7 2.3 2.7 3.0
ARG 3.1 6.7 5.3 15.1 17.2 18.2 18.7
ARM 0.4 0.5 0.5 0.5 0.4 1.2 1.3
AFG
[]
[]
[]
AGO
[ 3. 4.]
[ 15.8 16.4]
[ 4.7 5.8]
ALB
[ 0.2 0.5 0.2 1.3 1.6 2.7]
[ 1.5 1.6 1.6 1.6 1.6 1.7]
[ 0.2 0.5 0.2 1.3 1.6 2.7]
AND
[]
[]
[]
ARE
[]
[]
[]
ARG
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
[ 1.8 1.8 1.7 1.8 1.8 1.9 1.9]
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
ARM
[ 0.4 0.5 0.5 0.4 1.2 1.3]
[ 1.8 1.8 1.7 1.8 1.9 1.5]
[ 0.4 0.5 0.5 0.4 1.2 1.3]
这段代码可以工作,但这是一种尝试性的方法,在数据量较大时效率不高。你能给我一个更有效的方法,以及如何根据最小长度索引选择数据吗?你知道吗
一种想法是将所有3个数组进行多重化,然后测试它是否为not
0
,也可以使用loop by listL1
中的3个数组。然后还改变了逻辑-选择不匹配的值来代替np.argwhere
和np.delete
:如果使用pandas 0.24+,那么转换为numpy数组的更好方法是使用^{} :
编辑:
相关问题 更多 >
编程相关推荐