在Python中将CSV转换为UTF-8

2024-03-29 08:09:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个没有标题的重复CSV。尝试此操作时,会出现以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.

我已经阅读了python在UnicodeUTF-8编码上的CSVdocumentation并实现了它。 但是,生成的输出文件中没有数据。不知道我在这里做错了什么。

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:

    def unicode_csv(infile, outfile):
        inputs = csv.reader(utf_8_encoder(infile))
        output = csv.writer(outfile)

        for index, row in enumerate(inputs):
            yield [unicode(cell, 'utf-8') for cell in row]
            if index == 0:
                 continue
        output.writerow(row)

    def utf_8_encoder(infile):
        for line in infile:
            yield line.encode('utf-8')

unicode_csv(infile, outfile)

Tags: csvpathinencoderfordefasunicode
2条回答

从一开始

unicode_csv(infile,outfile)

不缩进,它超出了with命令的范围,当它调用时,infile和outfile都是关闭的。

文件应在使用时打开,而不是在定义函数时打开,因此:

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
    unicode_csv(infile,outfile)

解决方案是在

with open(path, 'r') as infile:

这两个参数是encoding='UTF-8'和errors='ignore'。这允许我创建一个原始CSV的副本,没有头和UnicodeDecodeError。下面是完整的代码。

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):
         # Create file with no header
         if index == 0:
             continue
         output.writerow(row)

相关问题 更多 >