Python读取CSV问题:多余的空格
当我尝试解析一个由MS SQL 2005 Express版导出的csv文件时,Python给我的字符串完全出乎我的意料。比如,如果csv文件中的一行是: "aaa,bbb,ccc,dddd",那么当Python解析它时,结果变成了:" a a a a , b b b , c c c, d d d d",就是这种感觉……这是怎么回事??
我试着在代码中去掉空格,但没有用。
import os
import random
f1 = open('a.txt', 'r')
f2 = open('dec_sql.txt', 'w')
text = 'abc'
while(text != ''):
text = f1.readline()
if(text==''):
break
splited = text.split(',')
for i in range (0, 32):
splited[i] = splited[i].replace(' ', '')
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values ('
sql += '\'' + splited[0] + '\', '
sql += '\'' + splited[1] + '\', '
sql += '\'' + splited[2] + '\', '
sql += '\'' + splited[3] + '\', '
sql += '\'' + splited[4] + '\', '
sql += '\'' + splited[5] + '\', '
sql += '\'' + splited[6] + '\', '
sql += '\'' + splited[7] + '\', '
sql += '\'' + splited[8] + '\', '
sql += '\'' + splited[9] + '\', '
sql += '\'' + splited[10] + '\', '
sql += '\'' + splited[11] + '\', '
sql += '\'' + splited[12] + '\', '
sql += '\'' + splited[13] + '\', '
sql += '\'' + splited[14] + '\', '
sql += '\'' + splited[15] + '\', '
sql += '\'' + splited[16] + '\', '
sql += '\'' + splited[17] + '\', '
sql += '\'' + splited[18] + '\', '
sql += '\'' + splited[19] + '\', '
sql += '\'' + splited[20] + '\', '
sql += '\'' + splited[21] + '\', '
sql += '\'' + splited[22] + '\', '
sql += '\'' + splited[23] + '\', '
sql += '\'' + splited[24] + '\', '
sql += '\'' + splited[25] + '\', '
sql += '\'' + splited[26] + '\', '
sql += '\'' + splited[27] + '\', '
sql += '\'' + splited[28] + '\', '
sql += '\'' + splited[29] + '\', '
sql += '\'' + splited[30] + '\', '
sql += '\'' + splited[31] + '\', '
sql += '\'' + splited[32] + '\' '
sql += ')'
print sql
f2.write(sql+'\n')
f2.close()
f1.close()
4 个回答
0
试着用记事本打开这个文件,然后使用“全部替换”功能,把 ' '
替换成 ''
。
-2
使用Python自带的CSV读取器可能会有帮助。看起来这是个与unicode有关的问题,这个问题让我很头疼。
import tkFileDialog
import csv
ENCODING_REGEX_REPLACEMENT_LIST = [(re.compile('\xe2\x80\x99'), "'"),
(re.compile('\xe2\x80\x94'), "--"),
(re.compile('\xe2\x80\x9c'), '"'),
(re.compile('\xe2\x80\x9d'), '"'),
(re.compile('\xe2\x80\xa6'), '...')]
def correct_encoding(csv_row):
for key in csv_row.keys():
# if there is a value for the current key
if csv_row[key]:
try:
csv_row[key] = unicode(csv_row[key], errors='strict')
except ValueError:
# we have a bad encoding, try iterating through all the known
# bad encodings in the ENCODING_REGEX_REPLACEMENT and replace
# everything and then try again
for (regex, replacement) in ENCODING_REGEX_REPLACEMENT_LIST:
csv_row[key] = regex.sub(replacement,csv_row[key])
print(csv_row)
csv_row[key] = unicode(csv_row[key])
# if there is NOT a value for the current key
else:
csv_row[key] = unicode('')
return csv_row
filename = tkFileDialog.askopenfilename()
csv_reader = csv.DictReader(open(filename, "rb"), dialect='excel') # assuming similar dialect
for csv_row in csv_reader:
csv_row = correct_encoding(csv_row)
# your application logic here