Pandas在尝试将函数应用于重复列时抛出奇怪的异常

2024-05-13 16:56:58 发布

您现在位置:Python中文网/ 问答频道 /正文

为什么我会收到以下错误消息?我试图将函数应用于重复的列。请不要告诉我解决方法是做一些像df["a"] = 2 * df["a"]这样的事情;这是一个我现在正在做的更复杂事情的简化示例。

>>> df = pd.DataFrame({"a" : [0,1,2], "b" : [1,2,3]})
>>> df[["a", "a"]].apply(lambda x: x[0] + x[1], axis = 1)
Traceback (most recent call last):
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 1980, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas\index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas\index.c:3332)
  File "pandas\index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas\index.c:3035)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3955)
  File "pandas\index.pyx", line 169, in pandas.index.IndexEngine._get_loc_duplicates (pandas\index.c:4236)
TypeError: unorderable types: str() > int()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4061, in apply
    return self._apply_standard(f, axis, reduce=reduce)
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4157, in _apply_standard
    results[i] = func(v)
  File "<stdin>", line 1, in <lambda>
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\core\series.py", line 583, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 2000, in get_value
    raise IndexError(key)
IndexError: (0, 'occurred at index 0')

Tags: inpypandasdfgetindexvaluelib
1条回答
网友
1楼 · 发布于 2024-05-13 16:56:58

IIUC您需要将x[0]x['1']更改为x.a,因为没有列01

a = df[["a", "a"]].apply(lambda x: x.a + x.a, axis = 1)
print (a)
   a  a
0  0  0
1  2  2
2  4  4

但是,如果duplicity列具有不同的值,请使用^{}

import pandas as pd

df = pd.DataFrame({"a" : [0,1,2], "b" : [1,2,3]})
df.columns = ['a','a']
print (df)
   a  a
0  0  1
1  1  2
2  2  3

df['sum'] = df.iloc[:,0] + df.iloc[:,1]
print (df)
   a  a  sum
0  0  1    1
1  1  2    3
2  2  3    5

什么是相同的:

df['sum'] = df.a.apply(lambda x: x.iloc[0] + x.iloc[1], axis = 1)
print (df)
   a  a  sum
0  0  1    1
1  1  2    3
2  2  3    5

相关问题 更多 >