Pandas: 使用.isin()时出现错误:“AttributeError: float对象没有属性‘isin’”

1 投票
2 回答
34843 浏览
提问于 2025-05-18 21:19

我正在使用Pandas和Python来导入一个CSV文件,然后对导入的数据进行处理,创建一个新列。

新列中的每一行都是根据对应的A列和B列的值来生成的。虽然数据框中还有更多的列,但这些列在下面的代码中并不重要。

导入的数据框有几千行。

A列和B列的值都是数字,范围从0到99,包括0和99。

import pandas as pd

import csv

df = pd.read_csv("import.csv", names=["Id", "Month", "Name", "ColA", "ColB" ])

def f(row):
    if row['colA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['colB'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColA']
    elif row['ColB'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['ColA'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColB']
    elif row['ColA'] > row['ColB']:
        val = row['ColA']
    elif row['ColA'] < row['ColB']:
        val = row['ColB']
    else: 
        val = row['ColA']
    return val            

df['NewColumnName'] = df.apply(f, axis=1)   

df.to_csv("export.csv", encoding='utf-8')

运行上面的代码时出现了错误:

AttributeError: ("'float' object has no attribute 'isin'", 'occurred at index 0')

所以很明显,.isin()不能这样使用。有没有什么建议可以解决这个问题?

编辑:如果使用Jezrael的方法添加一个新列,条件相同,代码可能看起来是这样的:

m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
m3 = (df['colC'].isin(L1) & df['colB'].isin(L2)) | (df['ColC'] > df['ColB'])
m4 = (df['colB'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColB'])
m5 = (df['colC'].isin(L1) & df['colA'].isin(L2)) | (df['ColC'] > df['ColA'])
m6 = (df['colA'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColA'])



df['NewColumnName'] = np.select([m1, m2, m3, m4, m5, m6], [df['ColA'], df['ColB'], df['ColC'], df['ColA'], df['ColB'], df['ColC'],], default=df['ColA'])

相关问题:

  • 暂无相关问题
暂无标签

2 个回答

3

你需要这样使用:

df[df['ColA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48])]

这样做会给你返回那些ColA列中值在上面提到的列表里的行。你现在是想针对每个值来检查,但这个方法是针对整列的。如果你想检查某个特定的值是否在这个列表中,你可以在你的函数里用numpy写类似下面的代码:

if np.any(row['colA'] == [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]):
   val = row['ColA']
4

在pandas中,最好避免使用循环,所以更好的做法是使用 numpy.select,并通过 & 来连接多个条件表示 AND,通过 | 来连接多个条件表示 OR

L1 = [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]
L2 = [30, 31, 32, 33, 34, 35, 57, 58]

m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])

df['NewColumnName'] = np.select([m1, m2], [df['ColA'], df['ColB']], default=df['ColA'])

撰写回答