创建迭代DataFram的函数

2024-06-09 00:04:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我在创建一个函数时遇到了一个问题,该函数将识别列中的某个特定值是否介于两个值之间。你知道吗

def bid(x):
if df['tla'] < 85000:
    return 1
elif (df['tla'] >= 85000) & (df['tla'] < 110000):
    return 2
elif (df['tla'] >= 111000) & (df['tla'] < 126000):
    return 3
elif (df['tla'] >= 126000) & (df['tla'] < 150000):
    return 4
elif (df['tla'] >= 150000) & (df['tla'] < 175000):
    return 5
elif (df['tla'] >= 175000) & (df['tla'] < 200000):
    return 6
elif (df['tla'] >= 200000) & (df['tla'] < 250000):
    return 7
elif (df['tla'] >= 250000) & (df['tla'] < 300000):
    return 8
elif (df['tla'] >= 300000) & (df['tla'] < 375000):
    return 9
elif (df['tla'] >= 375000) & (df['tla'] < 453100):
    return 10
elif df['tla'] >= 453100:
    return 11

我将其应用于我的新专栏:

df['bid_bucket'] = df['bid_bucket'].apply(bid)

我得到了这个错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

有人有什么想法吗?你知道吗


Tags: the函数dfreturnifbucketdef错误
3条回答

这已经可以通过pd.cut、定义箱子边缘以及向标签添加+1来实现,从而使编号从1开始。你知道吗

import pandas as pd
import numpy as np
df = pd.DataFrame({'tla': [7, 85000, 111000, 88888, 51515151]})

df['bid_bucket'] = pd.cut(df.tla, right=False,
                          bins=[-np.inf, 85000, 110000, 126000, 150000, 175000,
                                200000, 250000, 300000, 375000, 453100, np.inf], 
                          labels=False)+1

输出:df

        tla  bid_bucket
0         7           1
1     85000           2
2    111000           3
3     88888           2
4    126000           4
5  51515151          11

使用numpy.select尝试以下操作

import numpy as np

values = [1,2,3,4,5,6,7,8,9,10,11]
cond = [df['tla']<85000, (df['tla'] >= 850000) & (df['tla'] < 110000), .... ]

df['bid_bucket'] = np.select(cond, values)

您只需使用数字化函数指定范围

df['bid_bucket'] = np.digitize(df['bid_bucket'],np.arange(85000,453100,25000))

示例

a = np.random.randint(85000,400000,10)
#array([305628, 134122, 371486, 119856, 321423, 346906, 319321, 165714,360896, 206404])
bins=[-np.inf, 85000, 110000, 126000, 150000, 175000,
             200000, 250000, 300000, 375000, 453100, np.inf]
np.digitize(a,bins)

输出:

array([9, 4, 9, 3, 9, 9, 9, 5, 9, 7])

你知道吗

你知道吗

相关问题 更多 >