无法将pandas数据框导出到Excel/编码问题
我在导出我的一个数据框时遇到了编码问题,导致无法成功导出。
sjM.dtypes
Customer Name object
Total Sales float64
Sales Rank float64
Visit_Frequency float64
Last_Sale datetime64[ns]
dtype: object
导出为csv格式是没问题的
path = 'c:\\test'
sjM.to_csv(path + '.csv') # Works
但是导出为excel格式就失败了
sjM.to_excel(path + '.xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "testing.py", line 338, in <module>
sjM.to_excel(path + '.xls')
File "c:\Anaconda\Lib\site-packages\pandas\core\frame.py", line 1197, in to_excel
excel_writer.save()
File "c:\Anaconda\Lib\site-packages\pandas\io\excel.py", line 595, in save
return self.book.save(self.path)
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "c:\Anaconda\Lib\site-packages\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 22: ordinal not in range(128)
我知道问题出在“客户名称”这一列,因为删除了这一列后,导出到excel就能正常工作了。
我尝试按照一个问题中的建议(Python pandas to_excel 'utf8' codec can't decode byte)使用一个函数来解码并重新编码有问题的这一列
def changeencode(data):
cols = data.columns
for col in cols:
if data[col].dtype == 'O':
data[col] = data[col].str.decode('latin-1').str.encode('utf-8')
return data
sJM = changeencode(sjM)
sjM['Customer Name'].str.decode('utf-8')
L2-00864 SETIA 2
K1-00279 BERKAT JAYA
L2-00664 TK. ANTO
BR00035 BRASIL JAYA,TK
RA00011 CV. RAHAYU SENTOSA
所以转换为unicode看起来是成功的
sjM.to_excel(path + '.xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Anaconda\Lib\site-packages\pandas\core\frame.py", line 1197, in to_excel
excel_writer.save()
File "c:\Anaconda\Lib\site-packages\pandas\io\excel.py", line 595, in save
return self.book.save(self.path)
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "c:\Anaconda\Lib\site-packages\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 22: ordinal not in range(128)
- 为什么即使转换为unicode看起来成功了,导出还是失败呢?
- 我该如何解决这个问题,以便将这个数据框导出到excel呢?
@Jeff
感谢你给我指明了正确的方向
使用的步骤:
安装xlsxwriter(这个库没有和pandas一起打包)
sjM.to_excel(path + '.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
1 个回答
3
你需要使用版本大于等于0.13的pandas库,并且要用xlsxwriter
这个引擎来处理Excel文件,因为它支持原生的Unicode写入。默认的引擎xlwt
在0.14版本中会支持传递编码选项。
想了解更多关于引擎的文档,可以查看这里。