大Pandas在每日普查中发现独特的条目

2条回答

网友

1楼 · 编辑于 2024-05-14 10:16:37

我认为这里的诀窍是尽可能多地分组，并在一个月内检查这些（小）组的差异：

inmates = pd.read_csv('inmates.csv')

# group by everything except _id and count number of entries
grouped = inmates.groupby(
    ['Gender', 'Race', 'Age at Booking', 'Current Age', 'Date']).count()

# pivot the dates out and transpose - this give us the number of each
# combination for each day
grouped = grouped.unstack().T.fillna(0)

# get the difference between each day of the month - the assumption here
# being that a negative number means someone left, 0 means that nothing
# has changed and positive means that someone new has come in. As you
# mentioned yourself, that isn't necessarily true
diffed = grouped.diff()

# replace the first day of the month with the grouped numbers to give
# the number in each group at the start of the month
diffed.iloc[0, :] = grouped.iloc[0, :]

# sum only the positive numbers in each row to count those that have
# arrived but ignore those that have left
diffed['total'] = diffed.apply(lambda x: x[x > 0].sum(), axis=1)

# sum total column
diffed['total'].sum()  # 3393

网友

2楼 · 编辑于 2024-05-14 10:16:37

您可以使用df.drop_duplicates()返回只有唯一值的数据帧，然后对条目进行计数。你知道吗

这样的方法应该有用：

import pandas as pd
df = pd.read_csv('inmates_062016.csv', index_col=0, parse_dates=True)

uniqueDF = df.drop_duplicates()
countUniques = len(uniqueDF.index)
print(countUniques)

结果：

>> 11845

Pandas drop_duplicates Documentation

Inmates June 2016 CSV

这种方法/数据的问题在于，可能会有许多年龄/性别/种族相同的囚犯被过滤掉。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

大Pandas在每日普查中发现独特的条目

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >