计算Pandas DataFram中的重复值

print df Month LSOA code Longitude Latitude Crime type 0 2015-01 E01000916 -0.106453 51.518207 Bicycle theft 1 2015-01 E01000914 -0.111497 51.518226 Burglary 2 2015-01 E01000914 -0.111497 51.518226 Burglary 3 2015-01 E01000914 -0.111497 51.518226 Other theft 4 2015-01 E01000914 -0.113767 51.517372 Theft from the person

counts = dict() for i, row in df.iterrows(): key = ( row['Longitude'], row['Latitude'], row['Crime type'] ) if counts.has_key(key): counts[key] = counts[key] + 1 else: counts[key] = 1

{(-0.11376700000000001, 51.517371999999995, 'Theft from the person'): 1, (-0.111497, 51.518226, 'Burglary'): 2, (-0.111497, 51.518226, 'Other theft'): 1, (-0.10645299999999999, 51.518207000000004, 'Bicycle theft'): 1}

3条回答

网友

1楼 · 编辑于 2024-06-16 11:56:53

可以将groupby与函数size一起使用。然后我用重命名列0将索引重置为count。

print df
  Month LSOA       code  Longitude   Latitude             Crime type
0    2015-01  E01000916  -0.106453  51.518207          Bicycle theft
1    2015-01  E01000914  -0.111497  51.518226               Burglary
2    2015-01  E01000914  -0.111497  51.518226               Burglary
3    2015-01  E01000914  -0.111497  51.518226            Other theft
4    2015-01  E01000914  -0.113767  51.517372  Theft from the person

df = df.groupby(['Longitude', 'Latitude', 'Crime type']).size().reset_index(name='count')
print df
   Longitude   Latitude             Crime type  count
0  -0.113767  51.517372  Theft from the person      1
1  -0.111497  51.518226               Burglary      2
2  -0.111497  51.518226            Other theft      1
3  -0.106453  51.518207          Bicycle theft      1

print df['count']
0    1
1    2
2    1
3    1
Name: count, dtype: int64

网友

2楼 · 编辑于 2024-06-16 11:56:53

可以按经度和纬度分组，然后在Crime type列上使用^{}。

df.groupby(['Longitude', 'Latitude'])['Crime type'].value_counts().to_frame('count')

                                           count
Longitude Latitude  Crime type                  
-0.113767 51.517372 Theft from the person      1
-0.111497 51.518226 Burglary                   2
                    Other theft                1
-0.106453 51.518207 Bicycle theft              1

网友

3楼 · 编辑于 2024-06-16 11:56:53

通过collections.Counter可以得到O（n）溶液：

from collections import Counter

c = Counter(list(zip(df.Longitude, df.Latitude, df.Crime_type)))

结果：

Counter({(-0.113767, 51.517372, 'Theft-from-the-person'): 1,
         (-0.111497, 51.518226, 'Burglary'): 2,
         (-0.111497, 51.518226, 'Other-theft'): 1,
         (-0.106453, 51.518207, 'Bicycle-theft'): 1})

相关问题更多 >

编程相关推荐

热门问题

热门文章