如何查找合并中未找到的行(仅相对于其中一个数据帧)?

2024-06-16 15:07:06 发布

您现在位置:Python中文网/ 问答频道 /正文

在python 3中,我有以下数据帧:

candidatos_2018.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27887 entries, 0 to 414
Data columns (total 4 columns):
uf_eleicao_disputa    27887 non-null object
cargo_em_disputa      27887 non-null object
nome_urna             27887 non-null object
cpf                   27887 non-null object
dtypes: object(4)
memory usage: 1.1+ MB

deputados_eleitos_2014.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513 entries, 0 to 512
Data columns (total 10 columns):
politico_estado                     513 non-null object
politico_nome                       513 non-null object
politico_cpf                        513 non-null object
politico_nome_urna                  513 non-null object
politico_partido_eleicao            513 non-null object
politico_partido_atual              513 non-null object
politico_bancada_ruralista          513 non-null object
politico_total_recebido             513 non-null float64
politico_cargo_bancada_ruralista    14 non-null object
politico_candidato_cargo            0 non-null float64
dtypes: float64(2), object(8)
memory usage: 40.2+ KB

我通过两列进行了合并:

new_df = pd.merge(deputados_eleitos_2014, candidatos_2018, left_on='politico_cpf', right_on='cpf')

new_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 472 entries, 0 to 471
Data columns (total 17 columns):
politico_estado                     472 non-null object
politico_nome                       472 non-null object
politico_cpf                        472 non-null object
politico_nome_urna                  472 non-null object
politico_partido_eleicao            472 non-null object
politico_partido_atual              472 non-null object
politico_bancada_ruralista          472 non-null object
politico_total_recebido             472 non-null float64
politico_cargo_bancada_ruralista    12 non-null object
politico_candidato_cargo            0 non-null float64
uf_eleicao_disputa_x                472 non-null object
cargo_em_disputa_x                  472 non-null object
cpf_x                               472 non-null object
uf_eleicao_disputa_y                472 non-null object
cargo_em_disputa_y                  472 non-null object
nome_urna                           472 non-null object
cpf_y                               472 non-null object
dtypes: float64(2), object(15)
memory usage: 66.4+ KB

列中有472行具有相同的值。现在,关于“deputados_eleitos_2014”数据框,我想找到缺失的行->;41(513-472)

我试过这种方法

nomes_naoencontrados = pd.merge(deputados_eleitos_2014, 
                        candidatos_2018, 
                        left_on='politico_cpf', 
                        right_on='cpf',
                        how='outer',
                        indicator=True)

ldf = nomes_naoencontrados.query("_merge == 'left_only'").drop('_merge',axis=1)

ldf.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 21 columns):
politico_estado                     0 non-null object
politico_nome                       0 non-null object
politico_cpf                        0 non-null object
politico_nome_urna                  0 non-null object
politico_partido_eleicao            0 non-null object
politico_partido_atual              0 non-null object
politico_bancada_ruralista          0 non-null object
politico_total_recebido             0 non-null float64
politico_cargo_bancada_ruralista    0 non-null object
politico_candidato_cargo            0 non-null float64
uf_eleicao_disputa_x                0 non-null object
cargo_em_disputa_x                  0 non-null object
cpf_x                               0 non-null object
uf_eleicao_disputa_y                0 non-null object
cargo_em_disputa_y                  0 non-null object
nome_urna_x                         0 non-null object
cpf_y                               0 non-null object
uf_eleicao_disputa                  0 non-null object
cargo_em_disputa                    0 non-null object
nome_urna_y                         0 non-null object
cpf                                 0 non-null object
dtypes: float64(2), object(19)
memory usage: 0.0+ bytes

请问,是否有人知道我如何找到合并中未找到的行-与一个数据帧相比


Tags: columnsobjectnulltotalpoliticoemnoncargo
1条回答
网友
1楼 · 发布于 2024-06-16 15:07:06

你的代码确实有效。我可以复制这个:

import pandas as pd
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
result = pd.merge(left, right, on=['key1', 'key2'], how='outer', indicator=True)
ldf = result.query("_merge == 'left_only'").drop('_merge',axis=1)

这是ldf:

    A   B   key1 key2   C   D
  1 A1  B1   K0  K1    NaN  NaN
  4 A3  B3   K2  K1    NaN  NaN

相关问题 更多 >