Pandas和独角兽

2024-04-26 00:39:52 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我从pandas.DataFrame.to_json()中提取的字符串,将其放入redis,从其他地方的redis中提取,并尝试通过pandas.read_json()读取它:

DFJ {"args":{"0":"[]","1":"[]","2":"[]","3":"[]","4":"[]","5":"[]","6":"[]","7":"[]"},"date":{"0":1385944439000000000,"1":1385944439000000000,"2":1385944440000000000,"3":1385944440000000000,"4":1385944440000000000,"5":1385944440000000000,"6":1385944440000000000,"7":1385944440000000000},"host":{"0":"yy38.segm1.org","1":"kyy1.segm1.org","2":"yy10.segm1.org","3":"yy24.segm1.org","4":"yy24.segm1.org","5":"yy34.segm1.org","6":"yy15.segm1.org","7":"yy15.segm1.org"},"kwargs":{"0":"{}","1":"{}","2":"{}","3":"{}","4":"{}","5":"{}","6":"{}","7":"{}"},"operation":{"0":"x_gbinf","1":"x_initobj","2":"x_gobjParams","3":"gtfull","4":"x_gbinf","5":"gxyzinf","6":"deletemfg","7":"gxyzinf"},"thingy":{"0":"a13yy38","1":"a19kyy1","2":"a14yy10","3":"a14yy24","4":"a14yy24","5":"a12yy34","6":"a15yy15","7":"a15yy15"},"status":{"0":-101,"1":1,"2":-101,"3":-101,"4":-101,"5":-101,"6":1,"7":-101},"time":{"0":0.000801,"1":0.003244,"2":0.002247,"3":0.002787,"4":0.001067,"5":0.002652,"6":0.004371,"7":0.000602}}

它似乎没有任何unicode代码。但当我试图.read_json()时,我得到了:

Traceback (most recent call last):
  File "./sqlprofile.py", line 160, in <module>
    maybe_save_dataframes(rconn, configd, results)
  File "./sqlprofile.py", line 140, in maybe_save_dataframes
    h5store.append(out_queue, df)
  File "/home/username/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py", line 658, in append
    self._write_to_group(key, value, table=True, append=True, **kwargs)
  File "/home/username/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py", line 923, in _write_to_group
    s.write(obj = value, append=append, complib=complib, **kwargs)
  File "/home/username/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py", line 2985, in write
    **kwargs)
  File "/home/username/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py", line 2717, in create_axes
    raise e
TypeError: [unicode] is not implemented as a table column
> /home/username/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py(2717)create_axes()
-> raise e
(Pdb) locals()

这就是我在locals()中得到的结果-看起来append_axis(列名?)值为unicode。为什么?

{'append_axis': [u'args', u'date', u'host', u'kwargs', u'operation', u'thingy', u'status', u'time'], 'existing_table': None, 'blocks': [FloatBlock: [time], 1 x 8, dtype float64, ObjectBlock: [args, host, kwargs, operation, thingy], 5 x 8, dtype object, IntBlock: [status], 1 x 8, dtype int64, DatetimeBlock: [date], 1 x 8, dtype datetime64[ns]], 'axis': 1, 'self': frame_table  (typ->appendable,nrows->None,ncols->1,indexers->[index]), 'axes': [0], 'kwargs': {}, 'klass': <class 'pandas.io.pytables.DataCol'>, 'block_obj':   args                date            host kwargs              operation      thingy  status      time
0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf  a13yy38    -101  0.000801
1   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}         x_initobj  a19kyy1       1  0.003244
2   [] 2013-12-02 00:34:00  yy10.segm1.org     {}    x_gobjParams  a14yy10    -101  0.002247
3   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull  a14yy24    -101  0.002787
4   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf  a14yy24    -101  0.001067
5   [] 2013-12-02 00:34:00  yy34.segm1.org     {}           gxyzinf  a12yy34    -101  0.002652
6   [] 2013-12-02 00:34:00  yy15.segm1.org     {}  deletemfg  a15yy15       1  0.004371
7   [] 2013-12-02 00:34:00  yy15.segm1.org     {}           gxyzinf  a15yy15    -101  0.000602, 'axis_labels': [u'args', u'date', u'host', u'kwargs', u'operation', u'thingy', u'status', u'time'], 'nan_rep': 'nan', 'data_columns': [], 'obj':   args                date            host kwargs              operation      thingy  status      time
0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf  a13yy38    -101  0.000801
1   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}         x_initobj  a19kyy1       1  0.003244
2   [] 2013-12-02 00:34:00  yy10.segm1.org     {}    x_gobjParams  a14yy10    -101  0.002247
3   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull  a14yy24    -101  0.002787
4   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf  a14yy24    -101  0.001067
5   [] 2013-12-02 00:34:00  yy34.segm1.org     {}           gxyzinf  a12yy34    -101  0.002652
6   [] 2013-12-02 00:34:00  yy15.segm1.org     {}  deletemfg  a15yy15       1  0.004371
7   [] 2013-12-02 00:34:00  yy15.segm1.org     {}           gxyzinf  a15yy15    -101  0.000602, 'validate': True, 'a': (1, [u'args', u'date', u'host', u'kwargs', u'operation', u'thingy', u'status', u'time']), 'index_axes_map': {0: name->index,cname->index,axis->0,pos->0,kind->integer}, 'b': ObjectBlock: [args, host, kwargs, operation, thingy], 5 x 8, dtype object, 'e': TypeError('[unicode] is not implemented as a table column',), 'name': None, 'existing_col': None, 'j': 2, 'i': 1, 'min_itemsize': None, 'col': name->values_block_1,cname->values_block_1,dtype->None,shape->None}

我该怎么解决?这是熊猫/桌子里的虫子吗?

环境:

Python2.7

熊猫==0.12.0

表==3.0.0


Tags: pyorgnonehostpandasdatetimestatus
2条回答

上述解决方案可能会导致unicode特殊字符出现一些错误。将unicode转换为不会挂起unicode特殊字符的字符串的类似解决方案:

for col in types[types=='unicode'].index:
     df[col] = df[col].apply(lambda x: x.encode('utf-8').strip())

这部分是由于python如何处理unicode。在PythonUnicode How-To中有更多关于这个的信息。

似乎你的往返导致了一些unicode。不知道为什么,但很容易解决。 在python 2中,不能将unicode存储在HDFStore表中(但是,在python 3中这可以正常工作)。如果你想的话,你可以用固定的格式来做(它会被腌制)。见here

In [33]: df = pd.read_json(s)

In [25]: df
Out[25]: 
  args                date            host kwargs     operation  status   thingy      time
0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf    -101  a13yy38  0.000801
1   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}     x_initobj       1  a19kyy1  0.003244
2   [] 2013-12-02 00:34:00  yy10.segm1.org     {}  x_gobjParams    -101  a14yy10  0.002247
3   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull    -101  a14yy24  0.002787
4   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf    -101  a14yy24  0.001067
5   [] 2013-12-02 00:34:00  yy34.segm1.org     {}       gxyzinf    -101  a12yy34  0.002652
6   [] 2013-12-02 00:34:00  yy15.segm1.org     {}     deletemfg       1  a15yy15  0.004371
7   [] 2013-12-02 00:34:00  yy15.segm1.org     {}       gxyzinf    -101  a15yy15  0.000602

[8 rows x 8 columns]

In [26]: df.dtypes
Out[26]: 
args                 object
date         datetime64[ns]
host                 object
kwargs               object
operation            object
status                int64
thingy               object
time                float64
dtype: object

这是推断object数据类型序列的实际类型。只有当至少有一个字符串是unicode时,它们才会显示为unicode(否则它们将被推断为字符串)

In [27]: df.apply(lambda x: pd.lib.infer_dtype(x.values))
Out[27]: 
args            unicode
date         datetime64
host            unicode
kwargs          unicode
operation       unicode
status          integer
thingy          unicode
time           floating
dtype: object

以下是如何“修复”它

In [28]: types = df.apply(lambda x: pd.lib.infer_dtype(x.values))

In [29]: types[types=='unicode']
Out[29]: 
args         unicode
host         unicode
kwargs       unicode
operation    unicode
thingy       unicode
dtype: object

In [30]: for col in types[types=='unicode'].index:
   ....:     df[col] = df[col].astype(str)
   ....:     

看起来一样

In [31]: df
Out[31]: 
  args                date            host kwargs     operation  status   thingy      time
0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf    -101  a13yy38  0.000801
1   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}     x_initobj       1  a19kyy1  0.003244
2   [] 2013-12-02 00:34:00  yy10.segm1.org     {}  x_gobjParams    -101  a14yy10  0.002247
3   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull    -101  a14yy24  0.002787
4   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf    -101  a14yy24  0.001067
5   [] 2013-12-02 00:34:00  yy34.segm1.org     {}       gxyzinf    -101  a12yy34  0.002652
6   [] 2013-12-02 00:34:00  yy15.segm1.org     {}     deletemfg       1  a15yy15  0.004371
7   [] 2013-12-02 00:34:00  yy15.segm1.org     {}       gxyzinf    -101  a15yy15  0.000602

[8 rows x 8 columns]

但现在推断正确了。

In [32]: df.apply(lambda x: pd.lib.infer_dtype(x.values))
Out[32]: 
args             string
date         datetime64
host             string
kwargs           string
operation        string
status          integer
thingy           string
time           floating
dtype: object

相关问题 更多 >