我对Python和熊猫还不熟悉,所以请你耐心点。我想我有一个相当简单的问题要解决,但似乎做不好。 我有一个csv文件,我想用pandas数据帧编辑它。这些数据显示了从家到工作地点的流程,以及各个地点的ID以及纬度/经度坐标以及每个流程的值。在
id_home,name_home,lat_home,lon_home,id_work,work,lat_work,lon_work,value
1001,"Flensburg",54.78879007,9.4459971,1002,"Kiel",54.34189351,10.13048288,695
1001,"Flensburg",54.78879007,9.4459971,1003,"Lübeck, Hansestadt",53.88132436,10.72749774,106
1001,"Flensburg",54.78879007,9.4459971,1004,"Neumünster, Stadt",54.07797524,9.974475148,124
1001,"Flensburg",54.78879007,9.4459971,1051,"Dithmarschen",54.12904835,9.120139194,39
1001,"Flensburg",54.78879007,9.4459971,10,"Schleswig-Holstein",54.212,9.959,7618
1001,"Flensburg",54.78879007,9.4459971,1,"Schleswig-Holstein",54.20896049,9.957114419,7618
1001,"Flensburg",54.78879007,9.4459971,2000,"Hamburg, Freie und Hansestadt",53.57071859,9.943770215,567
1001,"Flensburg",54.78879007,9.4459971,20,"Hamburg",53.575,9.941,567
1001,"Flensburg",54.78879007,9.4459971,2,"Hamburg",53.57071859,9.943770215,567
1003,"Lübeck",53.88132436,10.72749774,100,"Saarland",49.379,6.979,25
1003,"Lübeck",53.88132436,10.72749774,10,"Saarland",54.212,9.959,25
1003,"Lübeck",53.88132436,10.72749774,11000,"Berlin, Stadt",52.50395948,13.39337765,274
1003,"Lübeck",53.88132436,10.72749774,110,"Berlin",52.507,13.405,274
1003,"Lübeck",53.88132436,10.72749774,11,"Berlin",52.50395948,13.39337765,274
我想删除所有相邻的具有相同值的重复行,只保留最后一行,其中id_work是一位数或两位数。应删除所有其他行。我怎样才能做到这一点?我基本上需要的是以下输出:
^{pr2}$非常感谢你的帮助!在
^{} 有一个
keep
参数,将其设置为last
:实际上,我认为以下是你想要的:
^{pr2}$在这里,我们将删除具有重复值且“id”长度不是1的行标签细分:
所以上面使用^{} 返回重复值,^{} 只返回唯一的重复值,^{} 为了测试成员资格,我们将'id'列转换为} 测试长度,并使用布尔掩码来屏蔽索引标签。在
str
,这样我们可以使用^{让我们将其简化为只有一个阵列的情况:
现在,让我们生成一个bool数组,它显示值发生变化的位置:
^{pr2}$这告诉我们我们要保留哪些值,哪些值与下一个值不同。但它忽略了最后一个值,该值应始终包括在内,因此:
现在,
arr[mask]
给出了:如果您不相信每个元素的最后一次出现是被选中的,您可以检查
mask.nonzero()
以数字形式获得索引:现在您已经知道如何为单个列生成掩码,您只需将其作为
df[mask]
应用于整个数据帧。在相关问题 更多 >
编程相关推荐