Python:使用for循环和条件根据小时确定一天中的时段

2024-04-28 22:46:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我想根据我的数据框中每小时的信息来命名一天中的时段

为此,我尝试以下方法:

day_period = []

for index,row in df.iterrows():
        
    hour_series = row["hour"]
    
    # Morning = 04:00-10:00
    #if hour_series >= 4 and hour_series < 10:
    if 4 >= hour_series < 10:
        day_period_str = "Morning"
        day_period.append(day_period_str)
    
    # Day = 10:00-16:00
    #if hour_series >= 10 and hour_series < 16:
    if 10 >= hour_series < 16:
        day_period_str = "Day"
        day_period.append(day_period_str)
        
    # Evening = 16:00-22:00
    #if hour_series >= 16 and hour_series < 22:
    if 16 >= hour_series < 22:
        day_period_str = "Evening"
        day_period.append(day_period_str)
        
    # Night = 22:00-04:00
    #if hour_series >= 22 and hour_series < 4:
    if 22 >= hour_series < 4:
        day_period_str = "Night"
        day_period.append(day_period_str)

但是,当再次检查我的day_时段列表的长度是否与我的数据帧(df)的长度相同时。。。他们不一样,也不应该。我看不出这个错误。如何修复代码

len(day_period)
>21882

len(df)
>25696

以下是数据预览:

    timestamp               latitude    longitude   hour    weekday
0   2021-06-09 08:12:18.000 57.728867   11.949463   8   Wednesday
1   2021-06-09 08:12:18.000 57.728954   11.949368   8   Wednesday
2   2021-06-09 08:12:18.587 57.728867   11.949463   8   Wednesday
3   2021-06-09 08:12:18.716 57.728954   11.949368   8   Wednesday
4   2021-06-09 08:12:33.000 57.728905   11.949309   8   Wednesday

我的最终目标是将这个列表附加到数据帧中


Tags: and数据dfifperiodrowseriesday
3条回答

在使用pandas时,应使用^{}

输入:

pd.DataFrame({'date': pd.date_range('2021-07-31', '2021-08-01', freq='2h')})
ranges = {4: 'Night', 10: 'Morning', 16: 'Day', 22: 'Evening', 24: 'Night'}

df['time_of_day'] = pd.cut(df['date'].dt.hour,
                           bins=[-1]+list(ranges),
                           labels=ranges.values(),
                           right=False,
                           ordered=False, # because label "Night" is duplicated
                          )

输出:

                  date time_of_day
0  2021-07-31 00:00:00       Night
1  2021-07-31 02:00:00       Night
2  2021-07-31 04:00:00     Morning
3  2021-07-31 06:00:00     Morning
4  2021-07-31 08:00:00     Morning
5  2021-07-31 10:00:00         Day
6  2021-07-31 12:00:00         Day
7  2021-07-31 14:00:00         Day
8  2021-07-31 16:00:00     Evening
9  2021-07-31 18:00:00     Evening
10 2021-07-31 20:00:00     Evening
11 2021-07-31 22:00:00       Night
12 2021-08-01 00:00:00       Night

经过一点测试后,问题似乎是22-4块,将它们分开可以解决这个问题

另外,我将>=更改为<=

使用此代码,它可以按预期工作:

day_period = []


for index,row in df.iterrows():
    hour_series = row["hour"]

    # Night 1 = 00:00-04:00
    #if hour_series <= 0 and hour_series < 4:
    if 0 <= hour_series < 4:
        day_period_str = "Night"
        day_period.append(day_period_str)

    # Morning = 04:00-10:00
    #if hour_series <= 4 and hour_series < 10:
    elif 4 <= hour_series < 10:
        day_period_str = "Morning"
        day_period.append(day_period_str)
    
    # Day = 10:00-16:00
    #if hour_series <= 10 and hour_series < 16:
    elif 10 <= hour_series < 16:
        day_period_str = "Day"
        day_period.append(day_period_str)
        
    # Evening = 16:00-22:00
    #if hour_series <= 16 and hour_series < 22:
    elif 16 <= hour_series < 22:
        day_period_str = "Evening"
        day_period.append(day_period_str)
        
    # Night 2 = 22:00-24:00
    #if hour_series <= 22 and hour_series < 24:

    elif 22 <= hour_series < 24:
        day_period_str = "Night"
        day_period.append(day_period_str)

print(len(all_rows))
print(len(day_period)) # they should match now

您的小于/大于不正确:

要查找所需的4到10个字符,请执行以下操作:

if 4 <= hour_series < 10:

相关问题 更多 >