如何在python中用多种条件快速存储数字

2024-05-16 00:18:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用我自己的定义把基于不同范围的数字分类。你知道吗

lambda很简单,但是如果条件大于2怎么办。我曾经用过if,但它没有改变任何东西

country = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

for x in country.POPULATION:
if x < 2000:
    x = 'small'
elif x >2000 and x <=4000:
    x='medium'
elif x > 5000 and x <=6000:
    x='big'
else:
    'huge'

希望数据能根据范围返回‘小’、‘中’等。你知道吗


Tags: andlambdadataframeif定义分类数字条件
2条回答

我将np.select与多个条件一起使用:

conditions = [
    country['POPULATION'] < 2000,
    ((country['POPULATION'] > 2000) & (country['POPULATION'] <= 4000)),
    ((country['POPULATION'] > 5000) & (country['POPULATION'] <=6000))
]

choices = [
    'small',
    'medium',
    'big'
]

# create a new column or assign it to an existing
# the last param in np.select is default
country['new'] = np.select(conditions, choices, 'huge')

  COUNTRY  POPULATION  ECONOMY     new
0   China        1200    86212   small
1   JAPAN        2345    11862  medium
2   KOREA        3400     1000  medium
3     USA        5600     8555     big
4      UK        9600    12000    huge

@Chris的np.select看起来不错,但我为pd.cutsee docs)写了一个答案,所以我不妨把它贴出来:

import pandas as pd
df = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

df["size"] = pd.cut(df["POPULATION"],
                bins=[0, 2000, 4000, 5000, 6000, df.POPULATION.max()],
                labels=["Small", "Medium", "NaN", "Large", "Huge"])

更有趣的是,你可以通过写一个任意的标签(在这个例子中,我写了“NaN”,但那是错误的)来处理4000到5000之间的差距

相关问题 更多 >