如何将值除以月份中的日期,并在月份中创建日期列?

2024-04-29 13:12:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据框,看起来像这样:

   year  month  name  value1  value2
0  2021    7    cars   5000    4000 
1  2021    7   boats   2000     250
2  2021    9    cars   3000    7000

我希望它看起来像这样:

    year  month day  name  value1  value2
0   2021    7    1   cars  161.29  129.03
1   2021    7    2   cars  161.29  129.03
2   2021    7    3   cars  161.29  129.03
3   2021    7    4   cars  161.29  129.03
              ...
31  2021    7    1   boats  64.51   8.064
32  2021    7    2   boats  64.51   8.064
33  2021    7    3   boats  64.51   8.064
              ...
62  2021    9    1    cars   100    233.33
63  2021    9    1    cars   100    233.33
64  2021    9    1    cars   100    233.33

我的想法是将值列除以月份中的天数,并创建一个日期列,以便最终实现一个连接年、月和日的日期列

有人能帮我吗


Tags: 数据namecarsyeardayvalue1月份天数
3条回答

您可以使用resample将采样月份增加到天:

import pandas as pd

df = pd.DataFrame([[2021,7,5000]], columns=['year', 'month', 'value'])

# create datetime column as period
df['datetime'] = pd.to_datetime(df['month'].astype(str) + '/' + df['year'].astype(str)).dt.to_period("M")

# calculate values per day by dividing the value by number of days per month
df['ndays'] = df['datetime'].apply(lambda x: x.days_in_month)
df['value'] = df['value'] / df['ndays']

# set datetime as index and resample:
df = df[['value', 'datetime']].set_index('datetime')
df = df.resample('d').ffill().reset_index()

#split datetime to separate columns
df['day'] = df['datetime'].dt.day
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year
df.drop(columns=['datetime'], inplace=True)
^{tb1}$

一种选择是使用^{}from calendar获取给定月份的天数,将该值除以该月份的天数,然后使用^{}放大数据帧,并使用^{}添加天数:

from calendar import monthrange

import pandas as pd

df = pd.DataFrame(
    {'year': {0: 2021, 1: 2021, 2: 2021}, 'month': {0: 7, 1: 7, 2: 9},
     'name': {0: 'cars', 1: 'boats', 2: 'cars'},
     'value1': {0: 5000, 1: 2000, 2: 3000},
     'value2': {0: 4000, 1: 250, 2: 7000}})
days_in_month = (
    df[['year', 'month']].apply(lambda x: monthrange(*x)[1], axis=1)
)

# Calculate new values
df.loc[:, 'value1':] = df.loc[:, 'value1':].div(days_in_month, axis=0)
df = df.loc[df.index.repeat(days_in_month)]  # Scale Up DataFrame
df.insert(2, 'day', df.groupby(level=0).cumcount() + 1)  # Add Days Column
df = df.reset_index(drop=True)  # Clean up Index

df

    year  month  day  name      value1      value2
0   2021      7    1  cars  161.290323  129.032258
1   2021      7    2  cars  161.290323  129.032258
2   2021      7    3  cars  161.290323  129.032258
3   2021      7    4  cars  161.290323  129.032258
4   2021      7    5  cars  161.290323  129.032258
..   ...    ...  ...   ...         ...         ...
87  2021      9   26  cars  100.000000  233.333333
88  2021      9   27  cars  100.000000  233.333333
89  2021      9   28  cars  100.000000  233.333333
90  2021      9   29  cars  100.000000  233.333333
91  2021      9   30  cars  100.000000  233.333333

为此,您需要创建一个包含每个月天数的临时数据框,然后将其合并,然后分割这些值

假设您有一年的数据,因此我们可以直接从中创建日期范围,并创建临时数据框:

dt_range = pd.DatFrame(pd.date_range(df.loc[0,'year'] + '-01-01', periods=365))
dt_range.columns = ['dte']
dt_range['year'] = dt_range['dte'].dt.year
dt_range['month'] = dt_range['dte'].dt.month
dt_range['day'] = dt_range['dte'].dt.day

现在我们可以创建新的数据帧:

new_df = pd.merge(df, dt_range,how='left',on=['year','month'])

现在我们所要做的就是分组和合并,我们就有了您所需要的

new_df = new_df.groupby(['year','month','day']).agg({'value':'mean'})

相关问题 更多 >