我已经阅读了CSV文件(包含客户的姓名和地址),并将数据分配到DataFrame表中。你知道吗
csv文件(或DataFrame表)的描述
DataFrame包含几行和7列
数据库示例
Client_id Client_Name Address1 Address3 Post_Code City_Name Full_Address
C0000001 A 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000001 A 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000001 A 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000002 B 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
C0000002 B 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
C0000002 B 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
C0000003 C 11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
C0000003 C 11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
C0000003 C 11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
C0000004 D 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000005 E 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
到目前为止,我已经编写了以下代码来生成上述表:
代码是
import pandas as pd
import glob
Excel_file = 'Address.xlsx'
Address_Info = pd.read_excel(Excel_file)
# rename the columns name
Address_Info.columns = ['Client_ID', 'Client_Name','Address_ID','Street_Name','Post_Code','City_Name','Country']
# extract specfic columns into a new dataframe
Bin_Address= Address_Info[['Client_Name','Address_ID','Street_Name','Post_Code','City_Name','Country']].copy()
# Clean existing whitespace from the ends of the strings
Bin_Address= Bin_Address.apply(lambda x: x.str.strip(), axis=1) # ← added
# Adding a new column called (Full_Address) that concatenate address columns into one
# for example Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden
Bin_Address['Full_Address'] = Bin_Address[Bin_Address.columns[1:]].apply(lambda x: ','.join(x.dropna().astype(str)), axis=1)
Bin_Address['Full_Address']=Bin_Address[['Full_Address']].copy()
Bin_Address['latitude'] = 'None'
Bin_Address['longitude'] = 'None'
# Remove repetitive addresses
#Temp = list( dict.fromkeys(Bin_Address.Full_Address) )
# Remove repetitive values ( I do beleive the modification should be here)
Temp = list( dict.fromkeys(Address_Info.Client_ID) )
我想删除整行如果在客户id、客户名称和完整地址列中有重复的值,到目前为止代码没有显示任何错误,但同时,我还没有得到预期的结果(我确信修改将在所附代码的最后一行)
预期产量为
Client_id Client_Name Address1 Address3 Post_Code City_Name Full_Address
C0000001 A 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000002 B 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
C0000003 C 11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
C0000004 D 10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
C0000005 E 10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
尝试:
您可以使用pandas中名为
dorp_duplicates()
的内置方法。还有很多现成的选择,你可以申请。你知道吗如果要保留第一个值或最后一个值,还可以选择它何时重复。你知道吗
默认情况下,它将始终保持第一个值。你知道吗
相关问题 更多 >
编程相关推荐