将一个EXCEL文件中的两个不同表格与每个oth进行比较

2024-05-19 17:08:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Excel文件和两张表格。 其中一个包含df1:

Country City Population Planet
Germany  Berlin 30500    Earth
Spain    Madrid  21021   Earth
...

第二个包含df2:

Country   City   Population Planet
Spain    Madrid  21021   Earth
...

现在,我想比较这两个数据帧,检查df1中是否也有df2中的行,如果是,则: 我想在df1中添加一个名为double的新列,如果该行在df1和df2中,我只想加一个“X”


Tags: 文件数据cityexcelcountry表格df1population
1条回答
网友
1楼 · 发布于 2024-05-19 17:08:24
# create string data

df1_str = '''Country,City,Population,Planet
Germany,Berlin,30500,Earth
Spain,Madrid,21021,Earth'''

df2_str = '''Country,City,Population,Planet
Spain,Madrid,21021,Earth'''
# read in to dataframe
df1 = pd.read_csv(io.StringIO(df1_str))
# read in to list for iteration
df1_list = pd.read_csv(io.StringIO(df1_str)).values.tolist()
df2_list = pd.read_csv(io.StringIO(df2_str)).values.tolist()

# join all columns and make a unique combination
df1_list = ["-".join(map(str, item)) for item in df1_list]
df2_list = ["-".join(map(str, item)) for item in df2_list]

# check the combinations exist in both data frame
common_flag = []
for item1 in df1_list:
    for item2 in df2_list:
        if item1 in item2: # you might prefer item1 == item2:
            common_flag.append("X")
        else:
            common_flag.append(None)

# add the result to datagrame df1
df1["double"] = pd.Series(common_flag)

创建组合list时,确保两个数据帧中的列顺序相同

输出:

   Country    City  Population Planet double
0  Germany  Berlin       30500  Earth   None
1    Spain  Madrid       21021  Earth      X

相关问题 更多 >