为什么Python每两行写一行编码错误的内容？

Question

我正在尝试将SQL Server 2000中一个表的某一列内容导出到文本文件中，之后我想用Python处理这些文件，并输出新的文本文件。

我的问题是，我无法让Python使用正确的编码，虽然输入文件在我的文本编辑器中显示正常，但输出文件每两行就会出现一次乱码。

我的Python代码可以简化为：

input = open('input', 'r')
string = input.read()
# Do stuff
output = open('output', 'w+')
output.write(string)

在Windows命令行中打印这个字符串时，我能看到预期的字符，不过字符之间多了一个空格。

但当我打开输出文件时，每两行就会有一次乱码（虽然“多出来”的空格消失了）。

一些背景信息：为了将列内容导出到文件，我使用了这个脚本：spWriteStringTofile，我认为它使用的是默认的服务器编码。

经过一些研究，发现这个编码是SQL_Latin1_General_CP1_CI_AS。我尝试在脚本开头添加# -*- coding: latin_1 -*，也尝试将SQL Server中的编码转换为Latin1_General_CI_AS，还尝试了string.decode('latin_1').encode('utf8')，但没有任何改变（除了最后一次尝试只输出了乱码）。

我该尝试什么呢？

编辑2：我尝试了newFile.write(line.decode('utf-16-be').encode('utf-16-le'))这个解决方案，但在文件的第一行就报错了。从Python的图形界面来看：

(Pdb) print line
ÿþ

(Pdb) print repr(line)
'\xff\xfe\n'
(Pdb) line.decode('utf-16-be').encode('utf-16-le')
*** UnicodeDecodeError: 'utf16' codec can't decode byte 0x0a in position 2: truncated data

在Sublime Text 2中，这第一行只出现了一个换行...

当我绕过这个错误（try: ... except: pass，快速且粗糙的方式）时，正确和错误的行之间添加了一个换行，但乱码依然存在。

编辑：我逐行检查了文档

newFile = open('newfile', 'a+')
with open('input') as fp:
    for line in fp:
        import pdb
        pdb.set_trace()
        newFile.write(line)

在调试器中，看到一行有问题：

(Pdb) print line
                           a s  S o l d D e b i t o r , # <-- Not actual copy paste
(Pdb) print repr(line)
'\x00\t\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00a\x00s\x00 \x00S\x00o\x00l\x00d\x00D\x00e\x00b\x00i\x00t\x00o\x00r\x00,\x00\r\x00\n'

然而出于某种原因，我无法复制粘贴print line的值：我可以复制单个字母字符，但当我选择它们之间的“空白”时却无法复制...

输入：

r <= @Data2 then (case when @Deviza='' or @Deviza=@sMoneda 
    then isnull(Debit,0) else isnull(DevDebit,0) end)
    else 0 end) 
     - Sum(case when DataInr >= @BeginDate and DataInr <= @Data2 
       then  (case when @Deviza='' or @Deviza=@sMoneda 
       then  isnull(Credit,0) else isnull(DevCredit,0) end)
       else 0 end) 
       else 0 end
    as SoldDebitor,

输出：

r <= @Data2 then (case when @Deviza='' or @Deviza=@sMoneda 
            then  isnull(Debit,0) else isnull(DevDebit,0) end)
਍ऀ                       攀氀猀攀 　 攀渀搀⤀ ഀഀ
      - Sum(case when DataInr >= @BeginDate and DataInr <= @Data2 
            then  (case when @Deviza='' or @Deviza=@sMoneda
            then  isnull(Credit,0) else isnull(DevCredit,0) end)
਍ऀ                       攀氀猀攀 　 攀渀搀⤀ ഀഀ
        else 0 end
਍ऀ                 愀猀 匀漀氀搀䐀攀戀椀琀漀爀Ⰰഀഀ

文本处理文本编辑器 sql server 数据转换调试技巧编码问题乱码文件导出

为什么Python每两行写一行编码错误的内容？

1 个回答

撰写回答