如何从数据帧中删除重复记录？

代码：

import pandas as pd import docx document = docx.Document(path) table = document.tables[0] data = [] for row_index, row in enumerate(table.rows): # Loop through rows data.append([]) # Add container list for each row. for col_index in range(13): # Loop through columns cell_text= row.cells[col_index].paragraphs[0].text.encode('utf-8') cell_decode_text = cell_text.decode('utf-8') data[row_index].append(cell_decode_text) df = pd.DataFrame(data) df.columns=["group","person","category","source","dds","time","date","location","text","title","date_export","num_export",""] df.drop_duplicates() df.head(20)

结果:

'date_export': {0: 'تاريخ الصادر', 1: '', 2: '2020/8/23', 3: '2020/8/23', 4: '2020/8/23', 5: '2020/8/23', 6: '2020/8/23', 7: '2020/8/23', 8: '2020/8/23', 9: '2020/8/23', 10: '2020/8/23', 11: '2020/8/23', 12: '2020/8/23'}, 'num_export': {0: 'رقم الصادر', 1: 'رقم الصادر', 2: '36015', 3: '36015', 4: '36016', 5: '36016', 6: '36017', 7: '36017', 8: '36018', 9: '36018', 10: '36019', 11: '36019', 12: '36020'},

2条回答

网友

1楼 · 编辑于 2024-04-26 06:29:56

使用您提供的数据集，下面的示例显示了如何使用df.drop_duplicates(inplace=True)完成任务；正如@Chinte在他们的回答中也提到的那样

之前：

>>> df

    date_export     num_export
0   تاريخ الصادر    رقم الصادر
1       رقم الصادر
2   2020/8/23   36015
3   2020/8/23   36015
4   2020/8/23   36016
5   2020/8/23   36016
6   2020/8/23   36017
7   2020/8/23   36017
8   2020/8/23   36018
9   2020/8/23   36018
10  2020/8/23   36019
11  2020/8/23   36019
12  2020/8/23   3602

之后：

>>> df.drop_duplicates(inplace=True)
>>> df

    date_export     num_export
0   تاريخ الصادر    رقم الصادر
1       رقم الصادر
2   2020/8/23   36015
4   2020/8/23   36016
6   2020/8/23   36017
8   2020/8/23   36018
10  2020/8/23   36019
12  2020/8/23   36020

网友

2楼 · 编辑于 2024-04-26 06:29:56

你必须把它设置好

df.drop_duplicates(inplace=True)

代码：

结果:

之前：

之后：

相关问题更多 >

编程相关推荐

热门问题

热门文章