在pandas中应用np.histogram重塑数据框

Question

我想要获取一个 pandas 数据框中每一列的标准化直方图。我想使用 np.histogram，但是它返回的是一个元组，而我只想要第一个元素。不过 pandas 似乎不太喜欢这样做。例如，下面这个代码可以正常工作：

import numpy as np

df = pd.DataFrame(np.random.uniform(size=20).reshape(5, 4))

bins = (0, 0.5, 1)
df.apply(np.histogram, bins=bins, normed=True)

并且返回了：

0    ([0.8, 1.2], [0.0, 0.5, 1.0])
1    ([0.8, 1.2], [0.0, 0.5, 1.0])
2    ([0.8, 1.2], [0.0, 0.5, 1.0])
3    ([0.8, 1.2], [0.0, 0.5, 1.0])
dtype: object

但是我只想要这个元组的第一个元素，所以我尝试了下面这个：

df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

但是出现了错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-3191795e120c> in <module>()
----> 1 df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3310                     if reduce is None:
   3311                         reduce = True
-> 3312                     return self._apply_standard(f, axis, reduce=reduce)
   3313             else:
   3314                 return self._apply_broadcast(f, axis)

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3415                 index = None
   3416 
-> 3417             result = self._constructor(data=results, index=index)
   3418             result.columns = res_index
   3419 

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    199                                  dtype=dtype, copy=copy)
    200         elif isinstance(data, dict):
--> 201             mgr = self._init_dict(data, index, columns, dtype=dtype)
    202         elif isinstance(data, ma.MaskedArray):
    203             import numpy.ma.mrecords as mrecords

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    321 
    322         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 323                               dtype=dtype)
    324 
    325     def _init_ndarray(self, values, index, columns, dtype=None,

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   4471     axes = [_ensure_index(columns), _ensure_index(index)]
   4472 
-> 4473     return create_block_manager_from_arrays(arrays, arr_names, axes)
   4474 
   4475 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
   3757         return mgr
   3758     except (ValueError) as e:
-> 3759         construction_error(len(arrays), arrays[0].shape[1:], axes, e)
   3760 
   3761 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
   3729         raise e
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732 
   3733 def create_block_manager_from_blocks(blocks, axes):

ValueError: Shape of passed values is (4,), indices imply (4, 5)

> /usr/local/lib/python2.7/site-packages/pandas/core/internals.py(3731)construction_error()
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732

有没有什么好的建议呢？

直方图标准化 pandas 数据框元组处理 np.histogram

在pandas中应用np.histogram重塑数据框

1 个回答

撰写回答