从csv中提取奇怪排列的数据并使用python转换为另一个csv文件

2024-05-16 04:35:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个奇数csv文件,它有数据头值和它的相应数据,方式如下:

,,,Completed Milling Job,,,,,, # row 1

,,,,Extended Report,,,,,

,,Job Spec numerical control,,,,,,,

Job Number,3456,,,,,, Operator Id,clipper,

Coder Machine Name,Caterpillar,,,,,,Job Start time,3/12/2013 6:22,

Machine type,Stepper motor,,,,,,Job end time,3/12/2013 9:16,

我需要从这个结构中提取数据,按照下面的结构创建另一个csv文件:

Status,Job Number,Coder Machine Name,Machine type, Operator Id,Job Start time,Job end time,,, # header
Completed Milling Job,3456,Caterpillar,Stepper motor,clipper,3/12/2013 6:22,3/12/2013 9:16,,, # data row

如果您注意到,会添加一个名为“status”的新标题列,但该值位于csv文件的第一行。输出文件中的其余列名是从原始文件中提取的。你知道吗

任何想法都将不胜感激-谢谢


Tags: 文件csv数据nameidnumbertimejob
2条回答

假设所有文件都与此完全相同(至少在CAP方面),这应该可以工作,尽管我只能根据您提供的确切数据来保证:

#!/usr/bin/python
import glob
from sys import argv

g=open(argv[2],'w')
g.write("Status,Job Number,Coder Machine Name,Machine type, Operator Id,Job Start time,Job end time\n")
for fname in glob.glob(argv[1]):
    with open(fname) as f:
        status=f.readline().strip().strip(',')
        f.readline()#extended report not needed
        f.readline()#job spec numerical control not needed
        s=f.readline()
        job_no=s.split('Job Number,')[1].split(',')[0]
        op_id=s.split('Operator Id,')[1].strip().strip(',')
        s=f.readline()
        machine_name=s.split('Coder Machine Name,')[1].split(',')[0]
        start_t=s.split('Job Start time,')[1].strip().strip(',')
        s=f.readline()
        machine_type=s.split('Machine type,')[1].split(',')[0]
        end_t=s.split('Job end time,')[1].strip().strip(',')
    g.write(",".join([status,job_no,machine_name,machine_type,op_id,start_t,end_t])+"\n")
g.close()

它接受一个glob参数(比如Job*.data)和一个输出文件名,并且应该构造您需要的内容。保存为'所以,派伊'或者别的什么,然后作为python so.py <data_files_wildcarded> output.csv运行它

这里有一个解决方案,它可以处理任何CSV文件,这些文件遵循与您所展示的相同的模式。这是一个非常恶劣的格式。你知道吗

我对这个问题产生了兴趣,并在午休时着手解决。代码如下:

COMMA = ','
NEWLINE = '\n'

def _kvpairs_from_line(line):
    line = line.strip()
    values = [item.strip() for item in line.split(COMMA)]

    i = 0
    while i < len(values):
        if not values[i]:
            i += 1  # advance past empty value
        else:
            # yield pair of values
            yield (values[i], values[i+1])
            i += 2  # advance past pair

def kvpairs_by_column_then_row(lines):
    """
    Given a series of lines, where each line is comma-separated values
    organized as key/value pairs like so:
        key_1,value_1,key_n+1,value_n+1,...
        key_2,value_2,key_n+2,value_n+2,...
        ...
        key_n,value_n,key_n+n,value_n+n,...

    Yield up key/value pairs taken from the first column, then from the second column
    and so on.
    """
    pairs = [_kvpairs_from_line(line) for line in lines]
    done = [False for _ in pairs]
    while not all(done):
        for i in range(len(pairs)):
            if not done[i]:
                try:
                    key_value_tuple = next(pairs[i])
                    yield key_value_tuple
                except StopIteration:
                    done[i] = True

STATUS = "Status"
columns = [STATUS]

d = {}

with open("data.csv", "rt") as f:
    # get an iterator that lets us pull lines conveniently from file
    itr = iter(f)

    # pull first line and collect status
    line = next(itr)
    lst = line.split(COMMA)
    d[STATUS] = lst[3]

    # pull next lines and make sure the file is what we expected
    line = next(itr)
    assert "Extended Report" in line
    line = next(itr)
    assert "Job Spec numerical control" in line

    # pull all remaining lines and save in a list
    lines = [line.strip() for line in f]

for key, value in kvpairs_by_column_then_row(lines):
    columns.append(key)
    d[key] = value

with open("output.csv", "wt") as f:
    # write column headers line
    line = COMMA.join(columns)
    f.write(line + NEWLINE)
    # write data row
    line = COMMA.join(d[key] for key in columns)
    f.write(line + NEWLINE)

相关问题 更多 >