我用utf-8编码成功地读取了Pandas数据帧中的Csv文件,但在UnicodeEncodeError'ascii'codec无法编码字符u'\xb0'-在打印此中间数据帧(下面的示例代码中为“p”)之前,打印数据帧上的操作结果(在本例中为crosstab)失败
import pandas as pd
filehandle='/home/ekta/Desktop/test_data/df_30.csv'
df0 = pd.read_csv(filehandle,skiprows=0, sep=',', encoding='utf-8',nrows=100)
df=df0[['country', 'appname']]
p=pd.crosstab(df['appname'], df['country'], rownames=['appname'], colnames=['country'])
print p #errr while printing this Dataframe
""" Data frame read was success, both for df0 & df, but cross tab fails with and the error that I get *"UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 13: ordinal not in range(128)"*. See "p" below """
#The dataframe, df looks like this
country appname
0 SGP Android Skout New
2 SGP Android Skout New
3 SGP Android Skout New
7 SGP Guess The Emoji - Android
14 SGP ScoreMobile Android
15 IDN Android Skout New
16 IND Truecaller - Caller ID & Block
19 IDN Indonesia News
....More ... <Chopped>
251 IDN 'Anonymous healthcare_and_fitness App cflw`2B2[h1s`lNzF@sPC1FtaCji:6kTF@']
272 SGP '(old) Weather\xc2\xb0'
# note the last two entries in this sample , there are more of these.
既然对df的“阅读”是成功的,我想了解的是:
在将“Series”对象(df['appname'])
传递给交叉表之前,是否应该对其进行编码->解码?
类似于df['country'].encode('utf-8').decode('utf-8')
[这不是一个有效的语法]
下面的片段转换了df,这让我想到了一个问题,为什么?
df0['country']=df0['country'].astype(unicode) df0['country'] 0 S 2 S 3 S 7 S 14 S 15 I 16 I 19 I 21 I 22 I 25 I 37 I 41 S 43 I
我的主要问题是:在我提供编码为“utf-8”之后,csv读取成功后,为什么我要在交叉表时再次编码?我做错什么了?在
注意,创建“p”数据帧交叉口pd不是问题,只需“打印p”。我使用p作为中间对象,但是想知道如何打印这个数据帧-然后我应该使用(1)技术吗?在
目前没有回答
相关问题 更多 >
编程相关推荐