Pandasdf.loc只能比较相同的带标签系列

2024-06-06 05:15:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我下面的代码(抱歉,我不能共享确切的数据)采用df,按日期范围过滤,并重新标记特定日期。然后我想把那些重新标记的日期拉到原始的df中。它工作正常,直到这行代码:

finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date']

从现在自己的研究来看,是因为索引长度不一样

print(finaldf.index)

print(finaldfmon.index)

我不明白为什么这会是一个问题,也不知道如何解决它。我想模拟一个excel vlookup,但是如果它们没有被命中(比如锚定值(认为主键)没有任何匹配项(外键),就不会留下#NA

此处显示完整代码:

    import pandas as pd
    import xlrd # added when using visual studio 
    import datetime
    from datetime import datetime
    finaldf = pd.read_excel("scrubcomplete.xlsx", encoding = "ISO-8859-1", dtype=object)
    finaldf.columns = finaldf.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    #
    today = pd.to_datetime(datetime.now().date())
    day_of_week = today.dayofweek
    last_monday = today - pd.to_timedelta(day_of_week, unit='d') 
    finaldf = finaldf[finaldf.Affliate_Code.str.contains('Part/Unix', na=False)]

f day_of_week !=0:
    finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date
    finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date # making it lower case y made it work 
    current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date()) # this worked as of 4.16
    earlydue = (finaldf.Due_Date < last_monday.date())
    flags = current_week_flags & earlydue
    finaldfmon = finaldf[current_week_flags]
    finaldfmon.loc[(finaldfmon['Due_Date']<last_monday.date()), 'Due_Date'] = last_monday # here we make all the due dates before monday, monday while complete date filterered
    finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = 
    finaldfmon['Due_Date'] 
    writer = pd.ExcelWriter('currentweek.xlsx', engine='xlsxwriter')
    finaldf.to_excel(writer, index=False, sheet_name='Sheet1')    
    writer.save()

错误是:

  raise ValueError("Can only compare identically-labeled "
ValueError: Can only compare identically-labeled Series objects

其原因是:

finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date']

Tags: toimporttodaydatetimedatelocduepd
1条回答
网友
1楼 · 发布于 2024-06-06 05:15:31

这不是一个答案,请参阅我在代码中的注释。另外,在这一点上,我认为这个问题更适合于codereview

finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date

# making it lower case y made it work 
finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date 

# this worked as of 4.16
current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date()) 
earlydue = (finaldf.Due_Date < last_monday.date())

flags = current_week_flags & earlydue
finaldfmon = finaldf[current_week_flags]

# here we make all the due dates before monday, monday while complete date filterered
# this works because last_monday is a single day
finaldfmon.loc[(finaldfmon['Due_Date']<last_monday.date()), 'Due_Date'] = last_monday 

# this fails in two places:
# finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date'] 

# finaldf['Due_Date'] != finaldfmon['Due_Date'] 
# these two series have different length, so you can't compare them 
# even if they have the same length, they have different indices
# (unless one of them is a single number/date, then it becomes the case above)

# finaldf.loc[..., 'Due_Date'] = finaldfmon['Due_Date']
# same story    

writer = pd.ExcelWriter('currentweek.xlsx', engine='xlsxwriter')
finaldf.to_excel(writer, index=False, sheet_name='Sheet1')    
writer.save()

下面的代码(主要是最后一行)实现了目标

import pandas as pd
import xlrd # added when using visual studio 
import datetime
from datetime import datetime
#read in excel file
finaldf = pd.read_excel("scrubcomplete.xlsx", encoding = "ISO-8859-1", dtype=object)
finaldf.columns = finaldf.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
#
today = pd.to_datetime(datetime.now().date())
day_of_week = today.dayofweek
last_monday = today - pd.to_timedelta(day_of_week, unit='d') 
#


if day_of_week !=0:
    finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date
    finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date # making it lower case y made it work
    current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date())
    finaldf.loc[(finaldf['Completed_Date'] >= last_monday.date()) & (finaldf['Completed_Date'] <= today.date()) & (finaldf['Due_Date'] < last_monday.date()), 'Due_Date'] = last_monday

相关问题 更多 >