使用重采样和groupby应用函数时出错

2024-06-16 11:13:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手。我曾经用R编写代码并使用period_apply函数。因此,我在下面的python中尝试了以下方法

首先,我不明白错误想告诉我什么。 第二,我不明白,如果我包含第一行数据,为什么我只会在groupby中得到错误。然而,使用resample,无论是否包含第一行,我都会得到错误。 第三,我如何解决这个问题,请不要告诉我跳过第一行,因为我使用的数据集要大得多

数据

                           Best_Bid  Best_Ask
Timestamp                                  
2019-05-02 11:59:59.602   29636.0   29638.0
2019-05-02 12:59:00.033       NaN       NaN
2019-05-02 12:59:00.033       NaN       NaN
2019-05-02 12:59:00.033       NaN       NaN
2019-05-02 12:59:00.033       NaN       NaN
2019-05-02 12:59:00.033       NaN       NaN
2019-05-02 12:59:00.033       NaN       NaN

{'Best_Bid': {Timestamp('2019-05-02 11:59:59.602000'): 29636.0,
  Timestamp('2019-05-02 12:59:00.033000'): nan},
 'Best_Bid_Q': {Timestamp('2019-05-02 11:59:59.602000'): 4.0,
  Timestamp('2019-05-02 12:59:00.033000'): nan},
 'Best_Ask': {Timestamp('2019-05-02 11:59:59.602000'): 29638.0,
  Timestamp('2019-05-02 12:59:00.033000'): nan}}

我正在尝试应用下面的函数(我知道我本可以做.agg({'Best_Bid':['last']}),但这是我原始代码的简化版本)

功能

def func(x):
    best_bid = (x['Best_Bid'])[-1]
    best_ask = (x['Best_Ask'])[-1]
    return pd.Series([best_bid,best_ask], index=['bbbid', 'aaask'])

groupby和grouper 如果我跳过第一排就跑。事情进展顺利

df.iloc[1:,:].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)


                               bbbid  aaask
Timestamp                                  
2019-05-02 12:59:59.999899904    NaN    NaN

但是,如果我包括第一行,我会得到以下错误

df.groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)


Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-191-85550b07b869>", line 1, in <module>
    df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
    result = self._python_apply_general(f)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
    res = f(group)
  File "<ipython-input-104-c57c7e2b6885>", line 2, in func
    best_bid = (x['Best_Bid'])[-1]
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
    value = Index.get_value(self, series, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
    return libindex.get_value_at(s, key)
  File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
  File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
  File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-191-85550b07b869>", line 1, in <module>
    df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
    result = self._python_apply_general(f)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
    res = f(group)
  File "<ipython-input-104-c57c7e2b6885>", line 2, in func
    best_bid = (x['Best_Bid'])[-1]
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
    value = Index.get_value(self, series, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
    return libindex.get_value_at(s, key)
  File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
  File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
  File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds

重新采样

无论是否包含第一行,我都得到了以下错误

df.resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)

Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
    return libindex.get_value_at(s, key)
  File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
  File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
  File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas\_libs\util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
    value = Index.get_value(self, series, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4419, in get_value
    raise e1
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 473, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas\_libs\index.pyx", line 479, in pandas._libs.index.DatetimeEngine._date_check_type
KeyError: 'Best_Bid'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "pandas\_libs\tslibs\conversion.pyx", line 520, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
  File "pandas\_libs\tslibs\parsing.pyx", line 228, in pandas._libs.tslibs.parsing.parse_datetime_string
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 649, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: Best_Bid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 660, in get_value
    return self.get_value_maybe_box(series, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 675, in get_value_maybe_box
    key = Timestamp(key)
  File "pandas\_libs\tslibs\timestamps.pyx", line 418, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas\_libs\tslibs\conversion.pyx", line 292, in pandas._libs.tslibs.conversion.convert_to_tsobject
  File "pandas\_libs\tslibs\conversion.pyx", line 523, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
ValueError: could not convert string to Timestamp
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-190-d2caa0c5152a>", line 1, in <module>
    df.iloc[371448:371455,0:3].resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 285, in aggregate
    result = self._groupby_and_aggregate(how, grouper, *args, **kwargs)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 359, in _groupby_and_aggregate
    result = grouped._aggregate_item_by_item(how, *args, **kwargs)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 1172, in _aggregate_item_by_item
    result[item] = colg.aggregate(func, *args, **kwargs)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 269, in aggregate
    result = self._aggregate_named(func, *args, **kwargs)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 452, in _aggregate_named
    output = func(group, *args, **kwargs)
  File "<ipython-input-104-c57c7e2b6885>", line 2, in func
    best_bid = (x['Best_Bid'])[-1]
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 662, in get_value
    raise KeyError(key)
KeyError: 'Best_Bid'

Tags: inpandasgetindexvalueliblocalline
1条回答
网友
1楼 · 发布于 2024-06-16 11:13:58

从我可以告诉您的情况来看,(x['Best_Bid'])[-1]出现了一个错误,因为它返回了一个KeyError:-1

apply函数正在迭代列(Best_Bid和Bid_Ask)中的每个元素(x),并试图从元素(x)中获取没有意义的最后一个索引

我没有你的数据集在我面前,但我会尝试这个代码,看看它是否工作

gdf = df.groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).copy()

print(gdf['Best_Bid'][gdf.index[-1]],gdf.index[-1])
print(gdf['Best_Ask'][gdf.index[-1]],gdf.index[-1])

现在,这段代码肯定可以简化,但它应该适用于所有行,并且如果是大型数据集,它将比.apply方法快得多

相关问题 更多 >