谢谢你花时间看我的帖子。你知道吗
我正在使用Python并合并来自许多CSV和TSV文件的信息。当我执行第二次合并时,数据在结果数据帧中被复制。我假设,我缺少一些基本的合并调用,但我还没有弄清楚。你知道吗
代码:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import matplotlib
# Enable inline plotting
%matplotlib inline
# read data into dataframes
ticketdata = r'/pathto.csv'
ticketdata = r'/pathto.csv'
userdata = r'/pathto.csv'
shipmentdata = r'/pathto.tsv'
tickets_df = pd.read_csv((ticketdata), usecols=['Id',"Requester",'Created at',"Requester email",
"Requester external id"])
users_df = pd.read_csv((userdata), usecols=['External ID','Printers',"Organization Title"])
shipment_df = pd.read_csv((shipmentdata), delimiter='\t', usecols=['Cust','Printer ID'])
# Clean up tickets_df & shipment_df
# Change "Requester external id" to "External ID" to support the merge
tickets_df.columns = ['Ticket Id',"Requester","External ID","Requester email",'Created at']
shipment_df.columns = ['VAR','Printers']
# Change column order for the sake of readability
tickets_df = tickets_df[['Ticket Id','Requester','Created at',"Requester email","External ID"]]
# Replace NaN in External ID with 0 and merge data
tickets_df.fillna(0, inplace=True)
merge1_df = pd.merge(tickets_df, users_df, on=['External ID'], how='left')
merge1_df = merge1_df[['Ticket Id','Created at',"Organization Title",'Requester',"Requester email","External ID",'Printers']]
merge2_df = pd.merge(merge1_df, shipment_df, on=['Printers'], how='left')
merge1_df看起来与预期一致(某些值预期为NaN):
Ticket Id Created at Organization Title Requester Requester email External ID Printers
0 1 2014-08-21 18:19 NaN dude dude@dude.com 0 NaN
1 2 2014-09-09 12:04 NaN dude1 duke1@dude.com 0 NaN
2 3 2014-09-09 12:04 NaN dude2 duke2@dude.com 0 NaN
3 4 2014-09-09 12:04 NaN dude3 duke3@dude.com 0 NaN
merge2\u df包含数千个重复:
Ticket Id Created at Organization Title Requester Requester email External ID Printers
0 1 2014-08-21 18:19 NaN dude dude@dude.com 0 NaN
1 1 2014-08-21 18:19 NaN dude dude@dude.com 0 NaN
2 1 2014-08-21 18:19 NaN dude dude@dude.com 0 NaN
3 1 2014-08-21 18:19 NaN dude dude@dude.com 0 NaN
你知道我怎么搞砸了吗?你知道吗
问题是数据帧中的NaN值。我添加了以下内容以将NaN替换为0,并解决了merge2\u df中的重复条目
相关问题 更多 >
编程相关推荐