pandas ValueError 布尔类型在 "if, else" 语句中
我正在尝试比较连续行中某一列(['Record Number'])的值。之后,我希望能把另一列(['Desc'])中的字符串合并成一行,然后删除重复的部分。
不过,下面这个“if”语句似乎对布尔掩码不太满意,因为即使我使用了它想要的 a.bool(),也还是报同样的错:
“ValueError: Series 的真值是模糊的。请使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。”
import pandas
with open('all.csv') as inc:
indf = pandas.read_csv(inc, usecols=['Record Number', 'Service Date'], parse_dates=True)
indf['Service Date'] = pandas.to_datetime(indf['Service Date'])
indf.sort(['Service Date', 'Record Number'], inplace=True)
indf['NUM'] = indf['Record Number'].shift(1)
msk = indf['NUM'] == indf['Record Number']
indf['MASK'] = msk
print(indf)
print(msk)
for row in indf:
if row['MASK'] == False:
#if row['MASK'].bool() == False: ### this gives the same error
print('Unique.')
else:
print('Dupe.')
我该怎么解决这个问题呢?
补充:我修正了我的拼写错误(if indf row['MASK']),但现在又出现了...
if row['MASK'] == False:
TypeError: string indices must be integers
还有
if row[4] == False:
IndexError: string index out of range
为什么不允许使用 'MASK'?为什么还在抱怨字符串?'MASK' 是布尔值啊。
Record Number int64 Service Date datetime64[ns] NUM float64 MASK bool
示例数据:
Record Number,Service Date,Desc 746611,05/26/2014,jiber 361783,05/27/2014,manawyddan 231485,06/02/2014,montespan 254004,06/03/2014,peshawar 369750,06/09/2014,cochleate 757701,06/10/2014,verticity 586983,06/16/2014,psychotherapist 643669,06/17/2014,discreation 252213,06/23/2014,hemiacetal 863001,06/24/2014,jiber 563798,06/30/2014,manawyddan 229226,07/01/2014,montespan 772189,07/07/2014,peshawar 412939,07/08/2014,cochleate 230209,07/14/2014,verticity 723012,07/15/2014,psychotherapist 455138,07/21/2014,discreation 605876,07/22/2014,hemiacetal 565893,07/28/2014,jiber 760420,07/29/2014,manawyddan 667002,05/27/2014,montespan 676209,06/17/2014,peshawar 828060,06/24/2014,cochleate 582821,07/01/2014,verticity 275503,07/15/2014,psychotherapist 667002,05/26/2014,discreation 676209,06/02/2014,hemiacetal 828060,06/09/2014,jiber 667002,06/10/2014,manawyddan 676209,06/17/2014,montespan 828060,06/23/2014,peshawar 667002,06/24/2014,cochleate 676209,06/30/2014,verticity 828060,07/21/2014,psychotherapist 667002,07/28/2014,discreation 676209,05/27/2014,hemiacetal 828060,06/03/2014,jiber 667002,06/10/2014,manawyddan 676209,06/16/2014,montespan 828060,06/24/2014,peshawar 667002,07/01/2014,cochleate 676209,07/07/2014,verticity 828060,07/28/2014,psychotherapist 667002,07/29/2014,discreation 828060,06/09/2014,hemiacetal 667002,06/10/2014,jiber 676209,06/17/2014,manawyddan 828060,06/23/2014,montespan 667002,06/24/2014,peshawar 676209,06/30/2014,cochleate 828060,07/21/2014,verticity 828060,06/09/2014,psychotherapist 667002,06/10/2014,discreation 676209,06/17/2014,hemiacetal 828060,06/23/2014,jiber 667002,06/24/2014,manawyddan 676209,06/30/2014,montespan
1 个回答
0
编辑:这个问题(除了下面讨论的拼写错误)主要是关于如何遍历一个数据框(DataFrame)。如果你直接遍历,它会遍历列名:
In [21]: df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=list('abc'))
In [22]: for col in df: print col # what you're doing
a
b
c
你想要遍历的是行,所以应该使用 iterrows:
In [23]: list(df.iterrows()) # tuples of (index, row)
Out[23]:
[(0, a 1
b 2
c 3
Name: 0, dtype: int64),
(1, a 4
b 5
c 6
Name: 1, dtype: int64)]
In [24]: for i, row in df.iterrows(): print row['b']
2
5
看起来这里有个拼写错误,indf['MASK']
(一个序列)应该改成 row['MASK']
(一个值)。你的代码在这个值上应该能正常运行。
正如异常信息中提到的,布尔序列的真假值是模糊的(可以查看一些讨论,见邮件列表,这也是几个GitHub问题的主题)。
基本问题在于Python和NumPy之间的不一致性(这会导致一些意外情况):
In [11]: bool([False])
Out[11]: True
In [12]: bool(np.array([False]))
Out[12]: False
在NumPy中,这个问题会根据数组的长度而变化:
In [21]: bool(np.array([False, True]))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
与其选择一方,pandas选择不偏不倚——让大家都不满意,但代码是正确的。