《Python数据分析》中的简单groupby示例失败
我刚开始学习Python(主要是想用它来替代Matlab,使用“ipython --pylab”),正在看《Python数据分析》这本书里的例子。在第253页,有一个简单的例子使用了'groupby'(传入一个数组列表)。我完全按照书上的例子来做,但却出现了这个错误:“TypeError: 'Series'对象是可变的,因此无法被哈希”。
import pandas as pd
from pandas import DataFrame
df = DataFrame({'key1' : ['a','a','b','b','a'],'key2' : ['one','two','one','two\
','one'],'data1' : np.random.randn(5),'data2' : np.random.randn(5)})
grouped = df['data1'].groupby(df['key1'])
means = df['data1'].groupby(df['key1'],df['key2']).mean()
-----类型错误的详细信息-------
TypeError Traceback (most recent call last)
<ipython-input-7-0412f2897849> in <module>()
----> 1 means = df['data1'].groupby(df['key1'],df['key2']).mean()
/home/joeblow/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
2725
2726 from pandas.core.groupby import groupby
-> 2727 axis = self._get_axis_number(axis)
2728 return groupby(self, by, axis=axis, level=level, as_index=as_index,
2729 sort=sort, group_keys=group_keys, squeeze=squeeze)
/home/joeblow/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_axis_number(self, axis)
283
284 def _get_axis_number(self, axis):
--> 285 axis = self._AXIS_ALIASES.get(axis, axis)
286 if com.is_integer(axis):
287 if axis in self._AXIS_NAMES:
/home/joeblow/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.pyc in __hash__(self)
639 def __hash__(self):
640 raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 641 ' hashed'.format(self.__class__.__name__))
642
643 def __iter__(self):
TypeError: 'Series' objects are mutable, thus they cannot be hashed
我这里缺少了什么简单的东西呢?
1 个回答
3
你没有完全按照文本中的方式来做。:^)
>>> means = df['data1'].groupby([df['key1'],df['key2']]).mean()
>>> means
key1 key2
a one 1.127536
two 1.220386
b one 0.402765
two -0.058255
dtype: float64
如果你想根据两个数组进行分组,你需要传递一个数组的列表。你现在传递了两个参数:(df['key1'],df['key2'])
,这被理解成了by
和axis
。