d上数据帧的奇行为

2024-04-18 09:00:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我刚刚完成了熊猫教程,对下面的行为有点困惑。你知道吗

In [28]: d
Out[28]: 
            Status  CustomerCount
StatusDate                       
2009-01-05       9           2519
2009-01-12      10           3351
2009-01-19      10           2188
2009-01-26      10           2301
2009-02-02       7           2204
2009-02-09       9           1538
2009-02-16       9           1983
2009-02-23       9           1960
2009-03-02      11           2887
2009-03-09       9           2927

通过字符串获取特定月份的记录非常有效:

In [31]: d['2009-02']
Out[31]: 
            Status  CustomerCount
StatusDate                       
2009-02-02       7           2204
2009-02-09       9           1538
2009-02-16       9           1983
2009-02-23       9           1960

分割日期范围也很有效:

In [33]: d['2009-02-09':'2009-02-10']
Out[33]: 
            Status  CustomerCount
StatusDate                       
2009-02-09       9           1538

使用相同的方法获取特定日期的记录不会:

In [32]: d['2009-02-09']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-b78c7ec0d497> in <module>()
----> 1 d['2009-02-09']

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1676             return self._getitem_multilevel(key)
   1677         else:
-> 1678             return self._getitem_column(key)
   1679 
   1680     def _getitem_column(self, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
   1683         # get column
   1684         if self.columns.is_unique:
-> 1685             return self._get_item_cache(key)
   1686 
   1687         # duplicate columns & possible reduce dimensionaility

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
   2563 
   2564             if not isnull(item):
-> 2565                 loc = self.items.get_loc(item)
   2566             else:
   2567                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
   1179         loc : int if unique index, possibly slice or mask if not
   1180         """
-> 1181         return self._engine.get_loc(_values_from_object(key))
   1182 
   1183     def get_value(self, series, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()

KeyError: '2009-02-09'

以下情况也不例外:

In [36]: d[d.first_valid_index()]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-071dd1d3c77c> in <module>()
----> 1 d[d.first_valid_index()]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1676             return self._getitem_multilevel(key)
   1677         else:
-> 1678             return self._getitem_column(key)
   1679 
   1680     def _getitem_column(self, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
   1683         # get column
   1684         if self.columns.is_unique:
-> 1685             return self._get_item_cache(key)
   1686 
   1687         # duplicate columns & possible reduce dimensionaility

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
   2563 
   2564             if not isnull(item):
-> 2565                 loc = self.items.get_loc(item)
   2566             else:
   2567                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
   1179         loc : int if unique index, possibly slice or mask if not
   1180         """
-> 1181         return self._engine.get_loc(_values_from_object(key))
   1182 
   1183     def get_value(self, series, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()

KeyError: Timestamp('2009-01-05 00:00:00')

但事实上:

In [37]: d.loc[d.first_valid_index()]
Out[37]: 
Status              9
CustomerCount    2519
Name: 2009-01-05 00:00:00, dtype: int64

这种行为是错误的还是我误解了什么?你知道吗


Tags: keyinselfpandasgetindexegglinux
1条回答
网友
1楼 · 发布于 2024-04-18 09:00:05

d是一个数据帧,因此使用df[key]时的主索引器是索引(参见文档中的indexing basics)。
只有当key是片时才会出现异常。为了方便起见,在数据帧上切片将切片。你知道吗

在您的示例中,d['2009-02-09':'2009-02-10']是一个切片,因此正确地切片行。在d['2009-02-09']中,您给出了一个键,因此它会查看列,因此您会得到一个keyrerror,因为'2009-02-09'不是列名。你知道吗

d['2009-02']是一个特例,一开始可能有点混乱。它是单个字符串,但实际上代表一个片段(此功能称为部分字符串索引,请参见文档here)。你知道吗

相关问题 更多 >