如何根据自定义词典将列中的文本转换为其他格式?

2024-04-27 22:19:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望根据大学/学院名称词典,使数据集中的教育数据保持一致。如何对字典运行代码并获得所需的输出?数据包括缩写和俗语名称

有人能在R中提供一个例子吗?我也愿意在python中尝试,R只是我的首选

这是我的字典的一个例子:

*University Name Dictionary
California Institute of Technology
New York University
Massachusetts Institute of Technology
Georgia Institute of Technology
Rutgers University
University of California, Berkley
University of California, Los Angeles

这是我的数据:

*Education
Cal Tech
NYU
MIT
Ga Tech
Georgia Tech
Rutgers
Berkley
UCLA

这就是我想要的:

*Education      *New Education
Cal Tech        California Institute of Technology
NYU             New York University
MIT             Massachusetts Institute of Technology
Ga Tech         Georgia Institute of Technology
Georgia Tech    Georgia Institute of Technology
Rutgers         Rutgers University
Berkley         University of California, Berkley
UCLA            University of California, Los Angeles

抱歉,如果已经有了解决方案,我就是找不到。我将感谢任何帮助


Tags: of数据名称new字典tech例子education
1条回答
网友
1楼 · 发布于 2024-04-27 22:19:37

pandas有函数replace(dictionary),其中dictionary类似于

 {"Cal Tech": "California Institute of Technology"} 

因为pandas.DataFrame的灵感来自R,所以R可能有类似的东西


data = {
    'Cal Tech': 'California Institute of Technology',
    'NYU': 'New York University',
    'MIT': 'Massachusetts Institute of Technology',
    'Ga Tech': 'Georgia Institute of Technology',
    'Georgia Tech': 'Georgia Institute of Technology',
    'Rutgers': 'Rutgers University',
    'Berkley': 'University of California, Berkley',
    'UCLA': 'University of California, Los Angeles',
}

import pandas as pd

df = pd.DataFrame({
'Education': ['Cal Tech', 'NYU', 'MIT', 'Ga Tech', 'Georgia Tech', 'Rutgers', 'Berkley', 'UCLA']
})

df['New Education'] = df['Education'].replace(data)

print(df)

结果:

      Education                          New Education
0      Cal Tech     California Institute of Technology
1           NYU                    New York University
2           MIT  Massachusetts Institute of Technology
3       Ga Tech        Georgia Institute of Technology
4  Georgia Tech        Georgia Institute of Technology
5       Rutgers                     Rutgers University
6       Berkley      University of California, Berkley
7          UCLA  University of California, Los Angeles

如果您使用regex=True,那么它也可以替换为更长的字符串

data = {
    'Cal Tech': 'California Institute of Technology',
    'NYU': 'New York University',
    'MIT': 'Massachusetts Institute of Technology',
    'Ga Tech': 'Georgia Institute of Technology',
    'Georgia Tech': 'Georgia Institute of Technology',
    'Rutgers': 'Rutgers University',
    'Berkley': 'University of California, Berkley',
    'UCLA': 'University of California, Los Angeles',
}

import pandas as pd

df = pd.DataFrame({
  'Education': ['I am from MIT']
})

df['New Education'] = df['Education'].replace(data, regex=True)

print(df)

结果:

       Education                                    New Education
0  I am from MIT  I am from Massachusetts Institute of Technology

文件:pandas.DataFrame.replace()

相关问题 更多 >