如何修复Python中处理日期的错误?
我刚开始学习Python,正在尝试一些我在网上找到的挑战。我想从一个字符串中提取日期,这样我就可以把它放到一列里。给我的输入是:
"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "ADVANCED_TRIP", 1, 2, 0, 0, "740025Jun2014,705406Jun2014,737722Jun2014,696130Jun2014", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
首先,我查找元素“ADVANCE_TRIP”,然后对于每个标识符,我在字符串中找到的地方,需要创建一个新的行,命名为“TRIP_ID”,并保留之前提到的日期。我尝试后得到的结果是:
"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
现在,正确的输出应该是这样的:
"131594", "", "BIDGROUP", 1, 0, 0, 5, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 25Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 06Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 22Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 30Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
我唯一不明白的是,如何提取每个“TRIP_ID”标识符旁边的日期,并把它放到相应的列中,也就是第十一列。例如,在我的输出中,我有:“7400”,0:00,0:00,01JAN2009,01JAN2009,但应该是:“7400”,0:00,0:00,25Jun2014,01JAN2009。
这是我写的代码:
import sys
lines = []
for line in sys.stdin:
lines.append(line.strip())
output_lines = []
for line in lines:
elements = line.split(", ")
if elements[2] == '"ADVANCED_TRIP"':
elements[2] = '"TRIP_ID"'
trip_ids = elements[7].split(",")
for i, trip_id in enumerate(trip_ids):
trip_id = trip_id.strip('"')
output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
output_line[4] = str(int(output_line[4]) + i)
output_lines.append(output_line)
else:
output_lines.append(elements)
for output_line in output_lines:
print(", ".join(output_line))
有没有人知道我该如何继续?
1 个回答
0
你走在正确的道路上。要正确提取每个“TRIP_ID”相关的日期,你可以使用正则表达式来识别字符串中的日期格式……你可以试试这个
import sys
import re
from datetime import datetime
lines = []
for line in sys.stdin:
lines.append(line.strip())
output_lines = []
for line in lines:
elements = line.split(", ")
if elements[2] == '"ADVANCED_TRIP"':
elements[2] = '"TRIP_ID"'
trip_ids = elements[7].split(",")
dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line) # Extract dates (e.g., 01JAN2009)
for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
trip_id = trip_id.strip('"')
output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
output_line[4] = str(int(output_line[4]) + i)
output_line[10] = date # Replace the placeholder date with the extracted date
output_lines.append(output_line)
else:
output_lines.append(elements)
for output_line in output_lines:
print(", ".join(output_line))
希望这对你有帮助!祝你在学习Python的过程中好运!