使用Python删除Excel单元格中的重复内容
我正在尝试用Python来去掉Excel表格中单元格里重复的内容。
这些数据在原文件的1列中。(每个单元格里的名字用“,”隔开)
Noah, Mason, Emily, Isabella, Emily
Liam, Madison, Mia, Ava, Mia
Jacob, Ethan, Jayden, Mia, Jayden
Mason, Emily, Daniel, Emily, Daniel
Madison, Mia, Sophia, Abigail, Sophia
Ethan, Jayden, Elizabeth, Madison, Elizabeth
Emily, Daniel, Olivia, Elizabeth, Olivia
Mia, Sophia, Isabella, Isabella
Jayden, Elizabeth, Ava, Ava
Daniel, Olivia, Mia, Mia
Sophia, Isabella, Emily, Emily
Elizabeth, Ava, Abigail, Abigail
Olivia, Mia, Madison, Madison
Isabella, Emily, Elizabeth, Elizabeth
我目前写的代码是:
old_file = open_workbook('c:\\Book1.xls',formatting_info=True)
old_sheet = old_file.sheet_by_index(0)
new_file = xlwt.Workbook(encoding='utf-8', style_compression = 0)
new_sheet = new_file.add_sheet('Result', cell_overwrite_ok = True)
for row_index in range(0, old_sheet.nrows):
column_con = old_sheet.cell(row_index, 0).value
aaa = dict.fromkeys(column_con).keys()
new_sheet.write(row_index, 0, aaa)
new_file.save('c:\\Book New 1.xls')
但是运行后,它把所有重复的字母都去掉了,而不是我想要的名字,结果变成了:
a bEihM,oNmsyenIl
a diMmLo,sAvn
a cbEdihJM,ontye
a EDimM,onsyel
a bdgihM,onpsASl
a bEdihJM,onstyezl
a zEDihm,Olbtvyen
a bIihM,olpSes
a bedihJ,nAtvyEzl
a eDiMlOnv,
a bIihm,olpSyesE
a bEgihl,Atvez
a diM,Olsovn
a beIhml,istyEz
我该怎么才能去掉重复的名字呢?谢谢。
2 个回答
-1
使用集合来存储你从Excel读取的数据
data=xlrd.open_workbook("C:\\Users\\I307658\\Desktop\\test.xlsx")
old_sheet = data.sheet_by_index(0)
new_file = xlwt.Workbook(encoding='utf-8', style_compression = 0)
new_sheet = new_file.add_sheet('Result', cell_overwrite_ok = True)
for row_index in range(0, old_sheet.nrows):
column_con = old_sheet.cell(row_index, 0).value
print column_con
aaa =set(column_con.split(","))
print ', '.join(aaa)
new_sheet.write(row_index, 0, ', '.join(aaa))
new_file.save("C:\\Users\\I307658\\Desktop\\Book New 1.xls")
1
dict.fromkeys()
这个方法是用来创建一个字典的,它需要一个“序列”作为输入,而不是一个“字符串”。
你可以试试这个:
for row_index in range(0, old_sheet.nrows):
column_con = old_sheet.cell(row_index, 0).value
# First split into a list and convert to sequence
column_con = tuple(column_con.split(', '))
aaa = dict.fromkeys(column_con).keys()
# Since aaa is a list of keys, you also need to join them in a string
aaa = ', '.join(aaa)
new_sheet.write(row_index, 0, aaa)