我是python新手。我曾经用R编写代码并使用period_apply
函数。因此,我在下面的python中尝试了以下方法
首先,我不明白错误想告诉我什么。
第二,我不明白,如果我包含第一行数据,为什么我只会在groupby
中得到错误。然而,使用resample
,无论是否包含第一行,我都会得到错误。
第三,我如何解决这个问题,请不要告诉我跳过第一行,因为我使用的数据集要大得多
数据
Best_Bid Best_Ask
Timestamp
2019-05-02 11:59:59.602 29636.0 29638.0
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
{'Best_Bid': {Timestamp('2019-05-02 11:59:59.602000'): 29636.0,
Timestamp('2019-05-02 12:59:00.033000'): nan},
'Best_Bid_Q': {Timestamp('2019-05-02 11:59:59.602000'): 4.0,
Timestamp('2019-05-02 12:59:00.033000'): nan},
'Best_Ask': {Timestamp('2019-05-02 11:59:59.602000'): 29638.0,
Timestamp('2019-05-02 12:59:00.033000'): nan}}
我正在尝试应用下面的函数(我知道我本可以做.agg({'Best_Bid':['last']})
,但这是我原始代码的简化版本)
功能
def func(x):
best_bid = (x['Best_Bid'])[-1]
best_ask = (x['Best_Ask'])[-1]
return pd.Series([best_bid,best_ask], index=['bbbid', 'aaask'])
groupby和grouper 如果我跳过第一排就跑。事情进展顺利
df.iloc[1:,:].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
bbbid aaask
Timestamp
2019-05-02 12:59:59.999899904 NaN NaN
但是,如果我包括第一行,我会得到以下错误
df.groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-191-85550b07b869>", line 1, in <module>
df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-191-85550b07b869>", line 1, in <module>
df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds
重新采样
无论是否包含第一行,我都得到了以下错误
df.resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4419, in get_value
raise e1
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 473, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\index.pyx", line 479, in pandas._libs.index.DatetimeEngine._date_check_type
KeyError: 'Best_Bid'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas\_libs\tslibs\conversion.pyx", line 520, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
File "pandas\_libs\tslibs\parsing.pyx", line 228, in pandas._libs.tslibs.parsing.parse_datetime_string
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: Best_Bid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 660, in get_value
return self.get_value_maybe_box(series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 675, in get_value_maybe_box
key = Timestamp(key)
File "pandas\_libs\tslibs\timestamps.pyx", line 418, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\conversion.pyx", line 292, in pandas._libs.tslibs.conversion.convert_to_tsobject
File "pandas\_libs\tslibs\conversion.pyx", line 523, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
ValueError: could not convert string to Timestamp
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-190-d2caa0c5152a>", line 1, in <module>
df.iloc[371448:371455,0:3].resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 285, in aggregate
result = self._groupby_and_aggregate(how, grouper, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 359, in _groupby_and_aggregate
result = grouped._aggregate_item_by_item(how, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 1172, in _aggregate_item_by_item
result[item] = colg.aggregate(func, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 269, in aggregate
result = self._aggregate_named(func, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 452, in _aggregate_named
output = func(group, *args, **kwargs)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 662, in get_value
raise KeyError(key)
KeyError: 'Best_Bid'
从我可以告诉您的情况来看,
(x['Best_Bid'])[-1]
出现了一个错误,因为它返回了一个KeyError:-1apply函数正在迭代列(Best_Bid和Bid_Ask)中的每个元素(x),并试图从元素(x)中获取没有意义的最后一个索引
我没有你的数据集在我面前,但我会尝试这个代码,看看它是否工作
现在,这段代码肯定可以简化,但它应该适用于所有行,并且如果是大型数据集,它将比.apply方法快得多
相关问题 更多 >
编程相关推荐