使用Pandas d的字典基于现有列创建新列

2024-06-16 09:55:10 发布

您现在位置:Python中文网/ 问答频道 /正文

Hellow堆栈溢出社区

我有一个df,里面有一个名为“本国”的专栏。不过,我想做一个新的专栏,把这些国家分为几个大洲。例如,中国将被归为所有属于亚洲的国家。代码如下所示

首先我做一个ContinentDict来保存国家/大陆

ContinentDict  = {'China':'Asia', 'Cambodia':'Asia', 'Hong':'Asia', 
                   'India':'Asia', 'Japan':'Asia', 'Laos':'Asia', 
                   'Philippines':'Asia',
                   'South':'Asia', 'Taiwan':'Asia', 'Thailand':'Asia', 
                   'Vietnam':'Asia', 'Canada':'Canada', 'United States':'United 
                    States',
                   'Cuba':'Caribbean', 'Dominican-Republic':'Caribbean', 
                   'Haiti':'Caribbean', 'Jamaica':'Caribbean', 
                   'Trinadad&Tobago':'Caribbean',
                   'England':'Europe', 'France':'Europe', 'Germany':'Europe', 
                   'Greece':'Europe', 'Holand-Netherlands':'Europe', 
                   'Hungary':'Europe',
                   'Ireland':'Europe', 'Italy':'Europe', 'Poland':'Europe', 
                   'Portugal':'Europe', 'Scotland':'Europe', 
                   'Yugoslavia':'Europe',
                   'Columbia':'Latin America', 'Ecuador':'Latin America', 
                   'El-Salvador':'Latin America', 'Guatemala':'Latin America',
                   'Honduras':'Latin America', 'Nicaragua':'Latin America', 
                   'Peru':'Latin America', 'Mexico':'Mexico', '?':'Unknown', 
                   'Outlying-US(Guam-USVI-etc)':'US Territories', 'Puerto-
                   Rico':'US Territories'} 

下一步,我把各大洲和东风联系起来

df = df.assign(continent=df['native_country'].map(ContinentDict))

但是,continents一栏是用NaN填充的,有人知道为什么吗?我有什么遗漏吗?在

任何帮助将不胜感激!在


Tags: df国家unitedusstateseuropelatinmexico
2条回答
df = pd.DataFrame({'native_country': ContinentDict.keys()})
df = df.assign(continent=df['native_country'].map(ContinentDict))
>>> df.head()
       native_country      continent
0              Canada         Canada
1            Honduras  Latin America
2                Hong           Asia
3  Dominican-Republic      Caribbean
4               Italy         Europe

midx = pd.MultiIndex.from_arrays([df['continent'], df['native_country']])
>>> midx
MultiIndex(levels=[[u'Asia', u'Canada', u'Caribbean', u'Europe', u'Latin America', u'Mexico', u'US Territories', u'United States', u'Unknown'], [u'?', u'Cambodia', u'Canada', u'China', u'Columbia', u'Cuba', u'Dominican-Republic', u'Ecuador', u'El-Salvador', u'England', u'France', u'Germany', u'Greece', u'Guatemala', u'Haiti', u'Holand-Netherlands', u'Honduras', u'Hong', u'Hungary', u'India', u'Ireland', u'Italy', u'Jamaica', u'Japan', u'Laos', u'Mexico', u'Nicaragua', u'Outlying-US(Guam-USVI-etc)', u'Peru', u'Philippines', u'Poland', u'Portugal', u'Puerto-Rico', u'Scotland', u'South', u'Taiwan', u'Thailand', u'Trinadad&Tobago', u'United States', u'Vietnam', u'Yugoslavia']],
           labels=[[1, 4, 0, 2, 3, 4, 5, 6, 0, 3, 3, 3, 0, 3, 7, 4, 0, 3, 0, 4, 0, 0, 2, 3, 3, 2, 4, 2, 0, 6, 4, 0, 3, 2, 3, 0, 0, 3, 4, 3, 8], [2, 16, 17, 6, 21, 28, 25, 27, 29, 18, 33, 40, 1, 10, 38, 7, 39, 20, 24, 4, 36, 34, 22, 9, 31, 5, 8, 14, 19, 32, 13, 3, 15, 37, 12, 23, 35, 11, 26, 30, 0]],
           names=[u'continent', u'native_country'])

一旦您在数据框中输入了国家和大陆,您只需设置索引:

^{pr2}$
df.iloc[df['native_country'].map(ContinentDict).argsort()]

相关问题 更多 >