Python xlrd 数据提取

3 投票

3 回答

26476 浏览

数据工程师

提问于 2025-04-16 04:29

我正在使用 Python 的 xlrd 库来从 Excel 表格中读取数据，具体可以参考这个链接：http://scienceoss.com/read-excel-files-from-python/

我想问的是，如果我读取到一行，第一列的内容是“员工姓名”，

而表格中还有另一行，第一列也是“员工姓名”，

那么我该如何读取最后一行中第一列为“员工姓名”的那一行的最后一列呢？也就是说，要忽略之前的那一行。

  wb = xlrd.open_workbook(file,encoding_override="cp1252") 
  wb.sheet_names()
  sh =  wb.sheet_by_index(0)
  num_of_rows = sh.nrows
  num_of_cols = sh.ncols
  valid_xl_format = 0
  invalid_xl_format = 0

  if(num_of_rows != 0):
     for i in range(num_of_rows):
        questions_dict = {}
        for j in range(num_of_cols):
              xl_data=sh.cell(i,j).value
              if ((xl_data == "Employee name")):
                  # Regardless of how many "Employee name" found in rows first cell,Read only the last "Employee name"

数据处理数据提取 excel xlrd 数据分析数据筛选行列操作表格读取

3 个回答

在我的情况下，我只用了pandas这个库来读取xls文件，这就解决了我的问题。

import pandas as pd
data = pd.read_html('file.xls')

回答于 2025-04-16 由 Python大师

分享举报

我不太明白你具体在问什么。
如果你能提供一些示例数据，可能会让你的意图更清楚。

你有没有试过反向遍历数据集呢？比如：

for i in reversed(range(num_of_rows)):
    ...
    if xl_data == "Employee name":
        # do something 
        # then break since you've found the final "Employee Name"
        break

回答于 2025-04-16 由 Python大师

分享举报

我正在使用python的xlrd库来从Excel表格中读取数据

你需要认真思考你在做什么，而不是随便拿一些博客上的代码，然后留下像 wb.sheet_names() 这样的无关代码，同时却省略了像 first_column = sh.col_values(0) 这样与你需求非常相关的部分。

下面是如何找到A列（第一列）中最后一个“whatever”的行索引——这个代码还没有测试：

import xlrd
wb = xlrd.open_workbook(file_name)
# Why do you think that you need to use encoding_overide?
sheet0 = wb.sheet_by_index(0)
tag = u"Employee name" # or u"Emp name" or ...
column_0_values = sheet0.col_values(colx=0)
try:
    max_tag_row_index = column_0_values.rindex(tag)
    print "last tag %r found at row_index %d" % (
        tag, max_tag_row_index)
except IndexError:
    print "tag %r not found" % tag

现在我们需要理解“如何从最后一行开始读取最后一列，其中第一格是‘员工姓名’”

假设“最后一列”是指列索引等于 sheet0.ncols - 1 的那一列，那么：

last_colx = sheet0.ncols - 1
required_values = sheet0.col_values(colx=last_colx, start_rowx=max_tag_row_index)
required_cells = sheet0.col_slice(colx=last_colx, start_rowx=max_tag_row_index)
# choose one of the above 2 lines, depending on what you need to do

如果这不是你的意思（这很有可能，因为这样会忽略很多数据（你为什么只想读取最后一列？）），请尝试用例子解释一下你的意思。

可能你想要遍历剩下的单元格：

for rowx in xrange(max_tag_row_index, sheet0.nrows): # or max_tag_row_index + 1
    for colx in xrange(0, sheet0.ncols):
        do_something_with_cell_object(sheet0.cell(rowx, colx))

回答于 2025-04-16 由 Python大师

分享举报

Python xlrd 数据提取

3 个回答

撰写回答