合并Pandas数据帧会复制一些数据

2024-06-16 09:14:51 发布

您现在位置:Python中文网/ 问答频道 /正文

谢谢你花时间看我的帖子。你知道吗

我正在使用Python并合并来自许多CSV和TSV文件的信息。当我执行第二次合并时,数据在结果数据帧中被复制。我假设,我缺少一些基本的合并调用,但我还没有弄清楚。你知道吗

代码:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd 
import sys
import matplotlib

# Enable inline plotting
%matplotlib inline

# read data into dataframes
ticketdata = r'/pathto.csv'
ticketdata = r'/pathto.csv'
userdata = r'/pathto.csv'
shipmentdata = r'/pathto.tsv'

tickets_df = pd.read_csv((ticketdata), usecols=['Id',"Requester",'Created at',"Requester email",
                                                "Requester external id"])
users_df = pd.read_csv((userdata), usecols=['External ID','Printers',"Organization Title"])
shipment_df = pd.read_csv((shipmentdata), delimiter='\t', usecols=['Cust','Printer ID'])

# Clean up tickets_df & shipment_df

# Change "Requester external id" to "External ID" to support the merge
tickets_df.columns = ['Ticket Id',"Requester","External ID","Requester email",'Created at']
shipment_df.columns = ['VAR','Printers']
# Change column order for the sake of readability
tickets_df = tickets_df[['Ticket Id','Requester','Created at',"Requester email","External ID"]]

# Replace NaN in External ID with 0 and merge data
tickets_df.fillna(0, inplace=True)
merge1_df = pd.merge(tickets_df, users_df, on=['External ID'], how='left')
merge1_df = merge1_df[['Ticket Id','Created at',"Organization Title",'Requester',"Requester email","External ID",'Printers']]
merge2_df = pd.merge(merge1_df, shipment_df, on=['Printers'], how='left')

merge1_df看起来与预期一致(某些值预期为NaN):

    Ticket Id   Created at  Organization Title  Requester   Requester email     External ID     Printers
0   1   2014-08-21 18:19    NaN     dude    dude@dude.com   0   NaN
1   2   2014-09-09 12:04    NaN     dude1   duke1@dude.com  0   NaN
2   3   2014-09-09 12:04    NaN     dude2   duke2@dude.com  0   NaN
3   4   2014-09-09 12:04    NaN     dude3   duke3@dude.com  0   NaN

merge2\u df包含数千个重复:

    Ticket Id   Created at  Organization Title  Requester   Requester email     External ID     Printers
0   1   2014-08-21 18:19    NaN     dude    dude@dude.com   0   NaN
1   1   2014-08-21 18:19    NaN     dude    dude@dude.com   0   NaN
2   1   2014-08-21 18:19    NaN     dude    dude@dude.com   0   NaN
3   1   2014-08-21 18:19    NaN     dude    dude@dude.com   0   NaN

你知道我怎么搞砸了吗?你知道吗


Tags: csvcomiddfemailrequesternanexternal