填写缺失日期pysp

import pandas as pd idx = pd.date_range('02-28-2018', '04-29-2018') df = pd.DataFrame([['Chandler Bing','55','2018-03-29',51], ['Chandler Bing','55','2018-03-29',60], ['Chandler Bing','55','2018-03-30',59], ['Harry Kane','45','2018-04-30',80], ['Harry Kane','45','2018-04-21',90]],columns=['name','accountid','timestamp','size']) df['timestamp'] = pd.to_datetime(df['timestamp']) pd.DatetimeIndex(df['timestamp']) del(df['timestamp']) #df.set_index('timestamp', inplace=True) print (df) df= df.reindex(idx, fill_value=0) print (df) uniquaccount=df['accountid'].unique() print(uniquaccount)

2条回答

网友

1楼 · 编辑于 2024-04-25 08:57:33

你可以在熊猫系列中使用reindex

import pandas as pd

idx = pd.date_range('02-28-2018', '04-29-2018')

s = pd.Series({'2018-03-29' : 55,
                '2018-03-30' : 55,
                '2018-03-29' : 55,
                '2018-04-20' : 65,
                '2018-04-29' :75})

s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

将插补所有缺失的日期：

^{pr2}$

网友

2楼 · 编辑于 2024-04-25 08:57:33

对于非唯一索引，重新编制索引的效果并不理想。相反，创建一个中间数据帧，每个时间戳/帐户组合一行，然后合并：

import pandas as pd

idx = pd.date_range('02-28-2018', '04-29-2018')

df = pd.DataFrame([['Chandler Bing','55','2018-03-29',51],
 ['Chandler Bing','55','2018-03-29',60],
 ['Chandler Bing','55','2018-03-30',59],
 ['Harry Kane','45','2018-04-30',80],
 ['Harry Kane','45','2018-04-21',90]],columns=['name','accountid','timestamp','size'])

df['timestamp'] = pd.to_datetime(df['timestamp']) 

# Step 1: create an intermediate dataframe with the cartesian product (CROSS JOIN)
#   of all of the timestamps and IDs
idx = pd.Series(idx, name='timestamp').to_frame()
unique_accounts = df[['accountid', 'name']].drop_duplicates()
# Pandas CROSS JOIN, see https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas/53699013#53699013
df_intermediate = pd.merge(unique_accounts.assign(dummy=1), idx.assign(dummy=1), on='dummy', how='inner')
df_intermediate = df_intermediate.drop(columns='dummy')

# Step 2: merge with the original dataframe, and fill missing values
df_new = df_intermediate.merge(df.drop(columns='name'), how='left', on=['accountid', 'timestamp'])
df_new['size'] = df_new['size'].fillna(value=0)

另外，考虑使用不同于“size”的变量名。size是熊猫的保留名称。在

相关问题更多 >

编程相关推荐

热门问题

热门文章