Pandas系列替换值

2024-06-16 13:59:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫系列,其值如下:

Bachelors Degree         639
Diploma                  291
O - Level                264
Masters Degree           149
Certificate              126
A - Level                 69
PGD                       40
Bachelors Degree          28
A-Level                   20
O-Level                   15
Masters                   10
Bachelors                  6
diploma                    5
certificate                5
Ph.D                       4
A- Level                   2
Post Graduate Diploma      1
Msc Environment            1
BBA                        1
O- Level                   1
Masters                    1
PhD                        1

我从excel中获取数据

我想用pandas做数据清理,比如用硕士学位替换所有硕士学位的案例(我可以用excel做,但我正在学习pandas)

我试过了

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":"diploma",
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":"certificate",
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)

仅返回只有一个值的证书密钥的结果

似乎我不能使用列表作为字典键的值

任何关于如何替换这些值的建议都将受到高度赞赏。 罗纳德 这是实际数据在excel列中的显示方式。 enter image description here

我已经添加了一个列中数据的图像。 面临的挑战是如何取代“硕士学位”的各种变体


Tags: 数据certificatepostlevelexcelph硕士学位masters
2条回答

首先,通过将所有值设置为列表,对mapp dict进行轻微更改:

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))


0         Bachelor's Degree
1          Ordinary Diploma
2            Ordinary Level
3           Master's Degree
4               Certificate
5            Advanced Level
6     Post Graduate Diploma
7         Bachelor's Degree
8            Advanced Level
9            Ordinary Level
10          Master's Degree
11        Bachelor's Degree
12         Ordinary Diploma
13              Certificate
14                      PHD
15                 A- Level
16    Post Graduate Diploma
17          Master's Degree
18        Bachelor's Degree
19           Ordinary Level
20          Master's Degree
21                      PHD

一个想法是将一个元素值转换为一个元素列表,如"diploma"["diploma"]

mapp1={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k.lower(): oldk for oldk, oldv in mapp1.items() for k in oldv}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

如果不可能,则使用:

d = {}
for k, v in mapp.items():
    if isinstance(v, list):
        for x in v:
            d[x.lower()] = k
    else:
        d[v.lower()] = k


df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

相关问题 更多 >