如何在Python中遍历csv，将符合新标准的行写入新文件

Question

我已经在这个问题上纠结了一段时间，觉得还是向专家请教一下比较好。我知道我写得不够好，搞得自己有点迷糊。

我有一个CSV文件，实际上有很多个。这部分倒不是问题。

CSV文件顶部的几行其实不是数据，但里面有一条重要的信息，就是数据的有效日期。对于某些类型的报告，这个日期在一行上，而对于其他类型则在另一行。

我的数据通常从离顶部10或11行的地方开始，但我并不总是能确定。我知道第一列总是有相同的信息（数据表的标题）。

我想从前面的几行中提取报告日期，对于A类型的文件，做一些操作A，对于B类型的文件，做一些操作B，然后把这一行写入一个新文件。我在增加行数时遇到了问题，完全不知道哪里出错了。

示例数据：

"Attribute ""OPSURVEYLEVEL2_O"" [Category = ""Retail v1""]"
Date exported: 2/16/13
Exported by user: William
Project: 
Classification: Online Retail v1
Report type: Attributes
Date range: from 12/14/12 to 12/14/12
"Filter OpSurvey Level 2(mine):  [ LEVEL:SENTENCE TYPE:KEYWORD {OPSURVEYLEVEL2_O:""gift certificate redemption"", OPSURVEYLEVEL2_O:""combine accounts"", OPSURVEYLEVEL2_O:""cancel account"", OPSURVEYLEVEL2_O:""saved project moved to purchased project"", OPSURVEYLEVEL2_O:""unlock account"", OPSURVEYLEVEL2_O:""affiliate promotions"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""disclaimer not clear"", OPSURVEYLEVEL2_O:""prepaid issue"", OPSURVEYLEVEL2_O:""customer wants to use coupons for print to store"", OPSURVEYLEVEL2_O:""customer received someone else's order"", OPSURVEYLEVEL2_O:""hi-res images unavailable"", OPSURVEYLEVEL2_O:""how to re-order"", OPSURVEYLEVEL2_O:""missing items"", OPSURVEYLEVEL2_O:""missing envelopes: print to store"", OPSURVEYLEVEL2_O:""missing envelopes: mail order"", OPSURVEYLEVEL2_O:""group rooms"", OPSURVEYLEVEL2_O:""print to store"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""publisher: card not available for print to store"", OPSURVEYLEVEL2_O:publisher}]"
Total: 905
OPSURVEYLEVEL2_O,Distinct Document,% of Document,Sentiment Score
PRINT TO STORE,297,32.82,-0.1
...

示例代码

#!/usr/bin/python

import csv, os, glob, sys, errno

path = '/path/to/Downloads'
for infile in glob.glob(os.path.join(path,'report_ATTRIBUTE_OP*.csv')):
    if 'OPSURVEYLEVEL2' in infile:
        prime_column = 'ops2'
    elif 'OPSURVEYLEVEL3' in infile:
        prime_column = 'ops3'
    else:
        sys.exit(errno.ENOENT)
    with open(infile, "r") as csvfile:
        reader = csv.reader(csvfile)
        report_date = 'DATE NOT FOUND'
        # import pdb; pdb.set_trace()
        for row in reader:
            foo = 0
            while foo < 1: 
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    foo = 1
                if "Date range" in row:
                    report_date = row[0][-8:]
                break
            if foo >= 1:
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    break
                if 'ops2' in prime_column:
                    dup_col = row[0]
                    row.insert(0,dup_col)
                    row.append(report_date)
                elif 'ops3' in prime_column:
                    row.append(report_date)
                with open('report_merge.csv', 'a') as outfile:
                    outfile.write(row)
            reader.next()

文件操作数据处理数据提取 csv 报告生成行遍历数据标准化数据有效日期

如何在Python中遍历csv，将符合新标准的行写入新文件

1 个回答

撰写回答