两个数据帧的有效组合,无需复制和还原| python

2024-06-16 09:59:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧与数千行,我需要结合成一个数据帧没有重复和逆转。例如:

数据帧1

drug1
drug2
drug3

数据帧2

disease1
disease2
disease3

因此,输出数据帧将是:

输出数据帧

drug1 disease1
drug1 disease2
drug1 disease3
drug2 disease1
drug2 disease2
drug2 disease3 
drug3 disease1
drug3 disease2
drug3 disease3

我不想要包含以下内容的输出组合:

disease1 drug1
drug1 drug1
disease1 disease1 

实际上,我使用pd.merge来尝试它,但是它返回复制和恢复,而且花费了很长时间,因为在数据帧1和2中有数千个

需要帮忙吗


Tags: 数据mergepd花费disease3disease1disease2drug1
3条回答

纯粹在pandas中的一种方法是创建MultiIndex from product,然后将其转换为数据帧:

>>> df1
       0
0  drug1
1  drug2
2  drug3
>>> df2
          0
0  disease1
1  disease2
2  disease3

df3 = (pd.MultiIndex.from_product([df1[0],df2[0]])
       .to_frame()
       .reset_index(drop=True))

>>> df3
       0         1
0  drug1  disease1
1  drug1  disease2
2  drug1  disease3
3  drug2  disease1
4  drug2  disease2
5  drug2  disease3
6  drug3  disease1
7  drug3  disease2
8  drug3  disease3

尝试此解决方案:

from pandas import DataFrame, merge

df1['key'] = 1
df2['key'] = 1

result = df1.merge(df2, on='key').drop('key', axis=1)

设置

df1 = pd.DataFrame(dict(col1=[f"drug{i}" for i in range(1, 4)]))
df2 = pd.DataFrame(dict(col2=[f"disease{i}" for i in range(1, 4)]))

merge在指定列上

df1.assign(A=1).merge(df2.assign(A=1)).drop('A', 1)

    col1      col2
0  drug1  disease1
1  drug1  disease2
2  drug1  disease3
3  drug2  disease1
4  drug2  disease2
5  drug2  disease3
6  drug3  disease1
7  drug3  disease2
8  drug3  disease3

理解力

pd.DataFrame([
    (i, j) for i in df1.col1
           for j in df2.col2
], columns=['col1', 'col2'])

pandas.concat

任意两个数据帧的叉积的推广

i = df1.index.repeat(len(df2))
j = np.tile(df2.index, len(df1))

pd.concat([
    df1.loc[i].reset_index(drop=True),
    df2.loc[j].reset_index(drop=True)
], sort=True, axis=1)

相关问题 更多 >