在Python中计数活动并更新列的问题

0 投票
1 回答
28 浏览
提问于 2025-04-14 17:52

我有一个输入数据,有16列,我想统计输出中的行数,这样我就能找到最后一行包含“BIDGROUP”这个元素的地方,并把这个统计数放在它的第七列。举个例子:

输入数据:

"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "ADVANCED_TRIP", 1, 2, 0, 0, "740025Jun2014,705406Jun2014,737722Jun2014,696130Jun2014", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

期望的输出:

"131594", "", "BIDGROUP", 1, 0, 0, 5, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 25Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 06Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 22Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 30Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

如果你看输入数据,第一行有“BIDGROUP”这个元素,第七列的计数是2,因为只有3个活动。而在输出中,计数变成了5,因为我们总共有6个活动。我的输出几乎是一样的,正如我之前所说的,我只需要统计行数。我已经尝试过为所有活动设置计数,但结果并没有像我预期的那样工作。

这是我原来的代码,没有计数:

import sys
import re


lines = []

for line in sys.stdin:
    lines.append(line.strip())

output_lines = []

for line in lines:
    elements = line.split(", ")
    if elements[2] == '"ADVANCED_TRIP"':
        elements[2] = '"TRIP_ID"'
        trip_ids = elements[7].split(",")
        dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line)
        for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
            trip_id = trip_id.strip('"')
            output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
            output_line[4] = str(int(output_line[4]) + i)
            output_line[10] = date
            output_lines.append(output_line)
    else:
        output_lines.append(elements)

for output_line in output_lines:
    print(", ".join(output_line))

这是我尝试过的,计算所有活动的计数:

import sys
import re

lines = []

for line in sys.stdin:
    lines.append(line.strip())

output_lines = []
total_activity_count = 0
trip_id_activity_count = 0

for line in lines:
    elements = line.split(", ")
    if elements[2] == '"BIDGROUP"':
        total_activity_count += 1
        elements[6] = str(total_activity_count)
        output_lines.append(elements)
    elif elements[2] == '"ADVANCED_TRIP"':
        elements[2] = '"TRIP_ID"'
        trip_ids = elements[7].split(",")
        dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line)
        for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
            trip_id = trip_id.strip('"')
            output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
            output_line[4] = str(int(output_line[4]) + i)
            output_line[10] = date
            output_lines.append(output_line)
            trip_id_activity_count += 1
    else:
        output_lines.append(elements)
        total_activity_count += 1

for output_line in output_lines:
    print(", ".join(output_line))

有没有人有其他的想法,看看这个怎么才能实现?

1 个回答

0

你遇到的问题是,在到达下一个BIDGROUP之前,无法更新当前的BIDGROUP行,因为你不知道总的活动数量。你在代码中尝试“时间旅行”;也就是说,你在计算出total_activity_count之前就已经在使用它了。不过,由于你把所有行都存放在一个列表里,你可以先保存当前BIDGROUP行的引用,等到需要的时候再更新它,像这样:

import sys
import re

lines = []

for line in sys.stdin:
    lines.append(line.strip())

output_lines = []
total_activity_count = 0
trip_id_activity_count = 0
lastbid = None

for line in lines:
    elements = line.split(", ")
    if elements[2] == '"BIDGROUP"':
        if total_activity_count:
            lastbid[6] = str(total_activity_count)
        total_activity_count = 0
        lastbid = elements
        output_lines.append(elements)
    elif elements[2] == '"ADVANCED_TRIP"':
        elements[2] = '"TRIP_ID"'
        trip_ids = elements[7].split(",")
        dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line)
        for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
            trip_id = trip_id.strip('"')
            output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
            output_line[4] = str(int(output_line[4]) + i)
            output_line[10] = date
            output_lines.append(output_line)
            trip_id_activity_count += 1
    else:
        output_lines.append(elements)
        total_activity_count += 1

if total_activity_count:
    lastbid[6] = str(total_activity_count)
for output_line in output_lines:
    print(", ".join(output_line))

有人可能会问,为什么你不使用csv模块来处理这些文件,因为它可以理解这类文件。

撰写回答