在excel Python中筛选列

2024-03-28 19:43:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过筛选值来组织一个列。换句话说,有成千上万个重复的名字,我只想从每个“组”中取一个名字,然后复制到另一个列中。

所以A列是当前的情况,be列是我想要得到的结果:

Column A                   Column B

AB Mark Sociedad Ltda      AB Mark Sociedad Ltda
AB Mark Sociedad Ltda      Acosta Acosta Manuel
AB Mark Sociedad Ltda      ALBAGLI, ZALIASNIK 
AB Mark Sociedad Ltda
Acosta Acosta Manuel 
Acosta Acosta Manuel 
Acosta Acosta Manuel
ALBAGLI, ZALIASNIK 
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK

最后,这是我要使用的脚本:

import openpyxl
from openpyxl import load_workbook
import os

os.chdir('path')
workbook = openpyxl.load_workbook('abc.xlsx')
page_i = workbook.get_sheet_names()
sheet = workbook.get_sheet_by_name('Sheet1')

for a in range(1, 10):
    representativex = sheet['A' + str(a)].value
    tuple(sheet['A1':'A10'])

    for row in sheet['A1':'A10']:
        if representativex in row:
            continue
        else:
            sheet['B' + str(a)].value 
            sheet['B' + str(a)] = representativex

        workbook.save('abc.xlsx')

不幸的是它不起作用。


Tags: inimportabsheetworkbookmarkstropenpyxl
1条回答
网友
1楼 · 发布于 2024-03-28 19:43:57

我并不是真的使用Python来实现这一点,但这里有一个比较简单的方法,我很快就找到了。

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')
ws1 = wb.active

Sample Data

names = []
for row in ws1.columns[0]:
    names.append(row.value)

names = sorted(list(set(names)))

start = 1
for name in names:
    ws1.cell(row = start, column=2).value = name
    start += 1

wb.save('test.xlsx')

Sample Output Data

编辑:显然更新的openpyxl需要稍作修改

更改此:

for row in ws1.columns[0]:
        names.append(row.value)

对此:

for row in ws1.iter_cols(max_col = 1, min_row=1):
    for cell in row:
        names.append(cell.value)

万一你的专栏不一样

iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source]

Returns all cells in the worksheet from the first row as columns.

If no boundaries are passed in the cells will start at A1.

If no cells are in the worksheet an empty tuple will be returned.
Parameters: 

    min_col (int) – smallest column index (1-based index)
    min_row (int) – smallest row index (1-based index)
    max_col (int) – largest column index (1-based index)
    max_row (int) – smallest row index (1-based index)

相关问题 更多 >