如何将此列表转换为数据帧

2024-05-13 19:09:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个代表联邦快递追踪的清单

history = ['Tuesday, March 16, 2021', '3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.', '5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery', '5:40 AM MIAMI, FL\nAt local FedEx facility', 'Monday, March 15, 2021', '11:42 PM OCALA, FL\nDeparted FedEx location', '10:01 PM OCALA, FL\nArrived at FedEx location', '8:28 PM OCALA, FL\nIn transit', '12:42 AM OCALA, FL\nIn transit']

如何将此列表转换为此3列数据框 enter image description here


Tags: 代表locationamhistoryatmarchflfedex
3条回答
history = [
    "Tuesday, March 16, 2021",
    "3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.",
    "5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery",
    "5:40 AM MIAMI, FL\nAt local FedEx facility",
    "Monday, March 15, 2021",
    "11:42 PM OCALA, FL\nDeparted FedEx location",
    "10:01 PM OCALA, FL\nArrived at FedEx location",
    "8:28 PM OCALA, FL\nIn transit",
    "12:42 AM OCALA, FL\nIn transit",
]


import re

r = re.compile("^(?:Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)")

data, cur_group = [], ""
for line in history:
    if r.match(line):
        cur_group = line
    else:
        data.append([cur_group, *line.split("\n", maxsplit=1)])

df = pd.DataFrame(data)
print(df)

印刷品:

                         0                      1                                                  2
0  Tuesday, March 16, 2021  3:03 PM Hollywood, FL  Delivered\nLeft at front door. Signature Servi...
1  Tuesday, March 16, 2021      5:52 AM MIAMI, FL                      On FedEx vehicle for delivery
2  Tuesday, March 16, 2021      5:40 AM MIAMI, FL                            At local FedEx facility
3   Monday, March 15, 2021     11:42 PM OCALA, FL                            Departed FedEx location
4   Monday, March 15, 2021     10:01 PM OCALA, FL                          Arrived at FedEx location
5   Monday, March 15, 2021      8:28 PM OCALA, FL                                         In transit
6   Monday, March 15, 2021     12:42 AM OCALA, FL                                         In transit

可以使用^{}检查元素是否为有效的日期时间

这应该比只检查元素是否包含日字符串(MondayTuesday,等等)更安全,因为事件在某处也包含日字符串(例如Delivery failed\nWill reattempt on Monday

import dateutil.parser

history = ['Tuesday, March 16, 2021', '3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.', '5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery', '5:40 AM MIAMI, FL\nAt local FedEx facility', 'Monday, March 15, 2021', '11:42 PM OCALA, FL\nDeparted FedEx location', '10:01 PM OCALA, FL\nArrived at FedEx location', '8:28 PM OCALA, FL\nIn transit', '12:42 AM OCALA, FL\nIn transit']
data = []

for string in history:
    try:
        day = dateutil.parser.parse(string)
    except:
        data.append([day, *string.split('\n', maxsplit=1)])

df = pd.DataFrame(data)

#    0           1                      2
# 0  2021-03-16  3:03 PM Hollywood, FL  Delivered\nLeft at front door. Signature Servi...
# 1  2021-03-16  5:52 AM MIAMI, FL      On FedEx vehicle for delivery
# 2  2021-03-16  5:40 AM MIAMI, FL      At local FedEx facility
# 3  2021-03-15  11:42 PM OCALA, FL     Departed FedEx location
# 4  2021-03-15  10:01 PM OCALA, FL     Arrived at FedEx location
# 5  2021-03-15  8:28 PM OCALA, FL      In transit
# 6  2021-03-15  12:42 AM OCALA, FL     In transit

好的,这是一个有点黑客,但可能会得到工作,如果格式是一致的,长期正则表达式可能是一个更好的方法

col1 = []
col2 = []
col3 = []
for h in history:
    if 'FL' in h:
        col1.append(date)
        new_list = h.split(',')
        item2 = new_list[0][4:]
        item3 = new_list[1][4:]
        col2.append(item2.replace('\n', '. '))
        col3.append(item3.replace('\n', '. '))
    else:
        date = h

pd.DataFrame({'col1': col1,
              'col2': col2,
              'col3': col3})

相关问题 更多 >