Python导入csv文件以utf-8或cp1252编码方式

import csv import unicodecsv #<Lots of other declarations and initialization> def _csv_dict(self, file,index_field, ScrubMe, **kwargs): #some irrelevant initialization stuff here. if 'formatting' in kwargs: formatting = kwargs['formatting'] else: formatting = None #cp1252 is OS default with open(file, encoding=formatting, errors='ignore') as f: #newline = '', if formatting == None: reader = csv.DictReader(f, dialect = 'excel') else: #assume for now UTF-8 is the only other supported format reader = unicodecsv.DictReader(f, dialect = csv.excel) for line in reader: <do some stuff - it's mostly building dictionaries, but I generally edit the data to only keep the stuff I care about and do a little data transformation to standard formats >

1条回答

网友

1楼 · 发布于 2024-04-20 06:07:19

我替换了我最初的答案，因为我有很多事情要做，我花了一段时间才解开它们。在

1）@lenz是对的。在python3中，不需要使用unicodesv.DictReader. 让我困惑的部分原因是实现上的差异。在

a）老年人unicodesv.DictReader来自Python 2：

kw_args={'errors' : None}
with open(filename, 'rb', **kw_args) as file:
    reader = unicodecsv.DictReader(file, dialect = csv.excel, encoding='utf_8_sig' )

b）对于Python 3csv.DictReader在

^{pr2}$

总结不同之处

文件打开的模式现在是文本而不是字节
由于不同的open方法，编解码器可以/应该在文件open vs.中的DictReader中指定
newline参数也只对作为文本打开的文件有效。在

2）因为我的UTF-8文件是由Excel生成的，所以在文件的顶部有一个UTF_16_le样式的BOM表。唯一适用的代码是“utf_8_sig”。在

3）因为SQL Server正在下游读取我的输出文件，因此输出编解码器必须为“utf_16_le”，否则SQL Server无法识别它。在

4）另外，由于目标是SQL Server，我必须在文件顶部手动插入BOM。在

csvfile.write('\uFEFF') 
writer.writeheader()

如果您在Excel中打开上述输出文件，它将不再位于列中，但SQL Server（实际上是SSIS）现在知道如何读取该文件。在

5）为了让我多惹点麻烦，有人在一些记录中有'\n'。在Excel作为源和目标的情况下，这不是问题，但对于SSIS来说是这样。我的解决方案：

for r in record_list:
    temp={}
    for k,v in r.items():

        if isinstance(v,str):
            temp[k] = v.replace('\n',' ')
        else:
            temp[k] = v
    writer.writerow(temp)

相关问题更多 >

编程相关推荐

热门问题

热门文章