我使用for循环具有以下函数:
def add_CQI_iterrows(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 0
series = []
for index, row in df.iterrows():
if row['Date'] == previous_row:
previous_row = row['Date']
print(CQI_index)
else:
CQI_index += 1
previous_row = row['Date']
series.append(CQI_index)
df['CQI'] = series
return df
我想找到一种方法将这个for循环转换为apply方法。类似这样的东西(不起作用):
def add_CQI_apply(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 1
series = []
df['CQI'] = df.apply(lambda row: previous_row = row['Date'] if row['Date'] == previous_row else CQI_index += 1 and previous_row = row['Date'], axis=1)
return df
我想做这个转换,因为我想看看apply方法有多快,以及是否可以对Pandas系列进行apply方法的矢量化
这是我的数据(data.json):
[
{
"Date": "9/20/2020 8:50",
"UE": 1
},
{
"Date": "9/20/2020 8:50",
"UE": 2
},
{
"Date": "9/20/2020 8:50",
"UE": 3
},
{
"Date": "9/20/2020 8:57",
"UE": 1
},
{
"Date": "9/20/2020 8:57",
"UE": 8
},
{
"Date": "9/20/2020 8:57",
"UE": 2
},
{
"Date": "9/20/2020 9:12",
"UE": 1
},
{
"Date": "9/20/2020 9:12",
"UE": 5
},
{
"Date": "9/20/2020 9:12",
"UE": 3
},
{
"Date": "9/20/2020 9:20",
"UE": 1
},
{
"Date": "9/20/2020 9:20",
"UE": 4
},
{
"Date": "9/20/2020 9:20",
"UE": 3
}
]
最后,这里是上载此数据的函数:
def upload_data(file):
df = pd.read_json(file)
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m %H:%M:%S")
df['CQI'] = np.nan
return df
df['CQI'] = (df['Date'] != df['Date'].shift()).cumsum()
相关问题 更多 >
编程相关推荐