基于另一列拆分数据帧中的行(每小时)

2024-06-16 11:19:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个由电动汽车充电会话组成的数据框架,其方案和内容如下:

PlugInTime, PlugOutTime, DeliveredEnergy, ChargeDuration                    

 - 2019-01-02 05:53:00, 2019-01-02 14:38:00, 36.0, 05:57:00
 - 2019-01-02 07:53:00, 2019-01-02 16:38:00, 8.0, 03:12:00
 - ...

我想根据ChargeDuration列拆分各行,例如,将第一行“分解”为:

 - 2019-01-02 05:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 06:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 07:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 08:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 09:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 10:53:00, 2019-01-02 14:38:00, 5.75, 05:57:00

如您所见,我希望ChargeDuration确定PlugInTime被拆分成多少个块,并且我希望DeliveredEnergy通过线性插值进行相应拆分,以便总和等于原始值

之后,我想每小时对数据帧重新采样,以便确定所有记录的充电会话的每小时能量需求

在这篇文章之前的研究中,我遇到了许多关于熊猫重采样的线索,但它们缺少了基于另一个专栏的拆分步骤(这里是ChargeDuration)

非常感谢您的帮助


Tags: 数据框架内容记录方案能量小时总和
3条回答

虽然不太漂亮,但它确实很管用。如果能看看这个问题是否有更优雅的解决方案,那就太好了

import pandas as pd

# create df
df = pd.DataFrame({'PlugInTime': ['2019-01-02 05:53:00', '2019-01-02 07:53:00'], 'PlugOutTime': ['2019-01-02 14:38:00', '2019-01-02 16:38:00'], 'DeliveredEnergy': [36.0, 8.0], 'ChargeDuration': ['05:57:00', '03:12:00'] })

# set dtypes
df['PlugInTime'] = pd.to_datetime(df.PlugInTime)
df['PlugOutTime'] = pd.to_datetime(df.PlugOutTime)
df['ChargeDuration'] = pd.to_timedelta(df.ChargeDuration)

# determine charge time in hours
df['hours'] = df.ChargeDuration.dt.seconds / 3600

# split the hours into a list
df['split'] = df.ChargeDuration.apply(lambda x: [1 for _ in range(int(x.total_seconds() // 3600))] + [(x.total_seconds() % 3600 / 3600 or None)])

# explode the list
df = df.explode('split')

# calculate the energy per hour
df['DeliveredEnergy'] = df.DeliveredEnergy / df.hours * df.split

# update PlugInTime
df['PlugInTime'] = df.PlugInTime + pd.to_timedelta(df.PlugInTime.groupby(level=0).cumcount(), unit='h')

print(df)

           PlugInTime         PlugOutTime DeliveredEnergy ChargeDuration  hours split
0 2019-01-02 05:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 06:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 07:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 08:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 09:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 10:53:00 2019-01-02 14:38:00          5.7479       05:57:00   5.95  0.95
1 2019-01-02 07:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 08:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 09:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 10:53:00 2019-01-02 16:38:00             0.5       03:12:00   3.20   0.2

我尝试使用虚拟数据构建类似的解决方案

import pandas as pd
import numpy as np

tst = pd.DataFrame([{'id':1, 'start': 1, 'duration': 7, 'work_rate': 100}, {'id': 2, 'start': 1, 'duration': 4, 'work_rate': 35}])
tst

    id  start   duration    work_rate
0   1   1       7           100
1   2   1       4           35

def create_list_for_splits(_id, start, duration, work_rate):
    start_end = list(range(start, start + duration))
    work_rate_split = [round(work_rate/duration, 2)]*duration
    
    res_ls = []
    
    for s, w in zip(start_end, work_rate_split):
        res_ls.append({'id': _id, 'start': s, 'work_rate': w})
    
    return res_ls

# Test the function

create_list_for_splits(1, 1, 7, 100)

[{'id': 1, 'start': 1, 'work_rate': 14.29},
 {'id': 1, 'start': 2, 'work_rate': 14.29},
 {'id': 1, 'start': 3, 'work_rate': 14.29},
 {'id': 1, 'start': 4, 'work_rate': 14.29},
 {'id': 1, 'start': 5, 'work_rate': 14.29},
 {'id': 1, 'start': 6, 'work_rate': 14.29},
 {'id': 1, 'start': 7, 'work_rate': 14.29}]

tst['work_split'] = tst.apply(lambda x: create_list_for_splits(x['id'], x['start'], x['duration'], x['work_rate']), axis=1)

pd.DataFrame([item for subilst in tst['work_split'].values for item in subilst])

    id  start   work_rate
0   1   1       14.29
1   1   2       14.29
2   1   3       14.29
3   1   4       14.29
4   1   5       14.29
5   1   6       14.29
6   1   7       14.29
7   2   1       8.75
8   2   2       8.75
9   2   3       8.75
10  2   4       8.75

试试这个。对某些事情使用了简单的方法,如果您想更改,请随意更改:

import datetime
import time
import pandas as pd

def func(PlugInTime, PlugOutTime, DeliveredEnergy, ChargeDuration):

    # inTime calculation
    inTime = str(PlugInTime).split(' ')
    d = inTime[0].split('-')
    t = inTime[1].split(':')
    Timein = int(datetime.datetime(int(d[0]), int(d[1]), int(d[2]), int(t[0]), int(t[1]), int(t[2])).strftime('%s'))

    # outTime calculation
    outTime = str(PlugOutTime).split(' ')
    d = outTime[0].split('-')
    t = outTime[1].split(':')
    Timeout = int(datetime.datetime(int(d[0]), int(d[1]), int(d[2]), int(t[0]), int(t[1]), int(t[2])).strftime('%s'))

    # energy
    energy = float(DeliveredEnergy)

    # charge
    h, m, s = str(ChargeDuration).split(':')
    charge = int(h) * 3600 + int(m) * 60 + int(s)
    val = charge

    # hourly
    add = 3600

    # chargeval
    if add < val:
        chargeval = (add/val)*energy


    list_ = []
    while charge - add >= 0:
        Timein += add
        t_ = time.localtime(Timein)
        lt = [time.strftime("%Y-%m-%d %H:%M:%S", t_), PlugOutTime, chargeval, ChargeDuration]
        list_.append(lt)
        charge -= add

    if charge:
        t_ = time.localtime(Timein+charge)
        lt = [time.strftime("%Y-%m-%d %H:%M:%S", t_), PlugOutTime, (charge/val) * energy, ChargeDuration]
        list_.append(lt)

    return list_


if __name__ == '__main__':

    l = [['2019-01-02 05:53:00', '2019-01-02 14:38:00', '36.0', '05:57:00']]

    # input dataframe
    df = pd.DataFrame(l, columns = ['PlugInTime', 'PlugOutTime', 'DeliveredEnergy', 'ChargeDuration'])

    list_ = []
    for ind, row in df.iterrows():
        l = func(row['PlugInTime'], row['PlugOutTime'], row['DeliveredEnergy'], row['ChargeDuration'])
        list_.extend(l)

        # printing the output for a row
        for i in l:
            print(i)

    # output dataframe
    df_out = pd.DataFrame(list_, columns = ['PlugInTime', 'PlugOutTime', 'DeliveredEnergy', 'ChargeDuration'])

输出:

['2019-01-02 06:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
['2019-01-02 07:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
['2019-01-02 08:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
['2019-01-02 09:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
['2019-01-02 10:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
['2019-01-02 11:50:00', '2019-01-02 14:38:00', 5.747899159663865, '05:57:00']

相关问题 更多 >