如何在python中用多种条件快速存储数字

country = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'], 'POPULATION':[1200,2345,3400,5600,9600], 'ECONOMY':[86212,11862,1000, 8555,12000]}) for x in country.POPULATION: if x < 2000: x = 'small' elif x >2000 and x <=4000: x='medium' elif x > 5000 and x <=6000: x='big' else: 'huge'

2条回答

网友

1楼 · 编辑于 2024-05-16 00:18:20

我将np.select与多个条件一起使用：

conditions = [
    country['POPULATION'] < 2000,
    ((country['POPULATION'] > 2000) & (country['POPULATION'] <= 4000)),
    ((country['POPULATION'] > 5000) & (country['POPULATION'] <=6000))
]

choices = [
    'small',
    'medium',
    'big'
]

# create a new column or assign it to an existing
# the last param in np.select is default
country['new'] = np.select(conditions, choices, 'huge')

  COUNTRY  POPULATION  ECONOMY     new
0   China        1200    86212   small
1   JAPAN        2345    11862  medium
2   KOREA        3400     1000  medium
3     USA        5600     8555     big
4      UK        9600    12000    huge

网友

2楼 · 编辑于 2024-05-16 00:18:20

@Chris的np.select看起来不错，但我为pd.cut（see docs）写了一个答案，所以我不妨把它贴出来：

import pandas as pd
df = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

df["size"] = pd.cut(df["POPULATION"],
                bins=[0, 2000, 4000, 5000, 6000, df.POPULATION.max()],
                labels=["Small", "Medium", "NaN", "Large", "Huge"])

更有趣的是，你可以通过写一个任意的标签（在这个例子中，我写了“NaN”，但那是错误的）来处理4000到5000之间的差距

相关问题更多 >

编程相关推荐

热门问题

热门文章