为什么win32com比xlrd慢得多?

2 投票
3 回答
3350 浏览
提问于 2025-04-15 23:31

我有一样的代码,使用了win32com和xlrd这两个库。使用xlrd的时候,算法运行不到一秒钟,而用win32com却要几分钟。

这是win32com的代码:

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(9, find_last_col(ws)):
    if ws.Cells(7, cnum).Value:
        wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, find_last_col(ws))
        for cend in xrange(cnum + 1, find_last_col(ws)): #finds end column
            if ws.Cells(7, cend).Value:
                wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, cend - 1)
                break
return wsHeaders

这是xlrd的代码:

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(8, ws.ncols):
    if ws.cell_value(6, cnum):
        wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)
        for cend in xrange(cnum + 1, ws.ncols):#finds end column
            if ws.cell_value(6, cend):
                wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, cend - 1)
                break
return wsHeaders

3 个回答

0

昨晚我在睡觉前想了想,最后用了这个。这是我最初版本的一个更好的改进:

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
last_col = find_last_col(ws)

for cnum in xrange(9, last_col):
    if ws.Cells(7, cnum).Value:
        value = ws.Cells(7, cnum).Value
        cstart = cnum
    if ws.Cells(7, cnum + 1).Value:
        wsHeaders[str(value)] = (cstart, cnum) #cnum is last in range
return wsHeaders
2

COM需要和另一个进程进行沟通,这个进程负责处理请求。而xlrd则是在同一个进程中直接操作数据结构。

12

(0) 你问“为什么win32com比xlrd慢这么多?”这个问题有点像“你停止打你妻子了吗?”——它基于一个可能不真实的假设。win32com是由一个很厉害的程序员用C语言写的,而xlrd是由一个普通程序员用纯Python写的。真正的区别在于,win32com需要调用COM,这涉及到进程间的通信,而xlrd则是直接读取Excel文件。此外,还有一个关键角色:你自己。请继续阅读。

(1) 你没有给我们看你在COM代码中反复使用的find_last_col()函数的源代码。在xlrd的代码中,你总是乐于使用同一个值(ws.ncols)。所以在COM代码中,你应该只调用一次find_last_col(ws),然后使用返回的结果。更新 请查看你另一个问题的回答,了解如何从COM获取xlrd的Sheet.ncols的等价值。

(2) 每个单元格的值被访问了两次,这让两个代码都变慢了。与其这样:

if ws.cell_value(6, cnum):
    wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)

不如试试

value = ws.cell_value(6, cnum)
if value:
    wsHeaders[str(value)] = (cnum, ws.ncols)

注意:每个代码片段中都有两次这样的情况。

(3) 你的嵌套循环的目的并不明显,但似乎有一些多余的计算,涉及到从COM的多次获取。如果你愿意告诉我们你想要实现什么,并给出一些例子,我们可以帮助你让代码运行得更快。至少,从COM提取值一次,然后在Python中用嵌套循环处理它们应该会更快。你有多少列?

更新 2 与此同时,小精灵们用内窥镜检查了你的代码,并得出了以下脚本:

tests= [
    "A/B/C/D",
    "A//C//",
    "A//C//E",
    "A///D",
    "///D",
    ]
for test in tests:
    print "\nTest:", test
    row = test.split("/")
    ncols = len(row)
    # modelling the OP's code
    # (using xlrd-style 0-relative column indexes)
    d = {}
    for cnum in xrange(ncols):
        if row[cnum]:
            k = row[cnum]
            v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive")
            print "outer", cnum, k, '=>', v
            d[k] = v
            for cend in xrange(cnum + 1, ncols):
                if row[cend]:
                    k = row[cnum]
                    v = (cnum, cend - 1)
                    print "inner", cnum, cend, k, '=>', v
                    d[k] = v
                    break
    print d
    # modelling a slightly better algorithm
    d = {}
    prev = None
    for cnum in xrange(ncols):
        key = row[cnum]
        if key:
            d[key] = [cnum, cnum]
            prev = key
        elif prev:
            d[prev][1] = cnum
    print d
    # if tuples are really needed (can't imagine why)
    for k in d:
        d[k] = tuple(d[k])
    print d

它输出这个:

Test: A/B/C/D
outer 0 A => (0, 4)
inner 0 1 A => (0, 0)
outer 1 B => (1, 4)
inner 1 2 B => (1, 1)
outer 2 C => (2, 4)
inner 2 3 C => (2, 2)
outer 3 D => (3, 4)
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)}
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]}
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)}

Test: A//C//
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
{'A': (0, 1), 'C': (2, 5)}
{'A': [0, 1], 'C': [2, 4]}
{'A': (0, 1), 'C': (2, 4)}

Test: A//C//E
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
inner 2 4 C => (2, 3)
outer 4 E => (4, 5)
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)}
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]}
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)}

Test: A///D
outer 0 A => (0, 4)
inner 0 3 A => (0, 2)
outer 3 D => (3, 4)
{'A': (0, 2), 'D': (3, 4)}
{'A': [0, 2], 'D': [3, 3]}
{'A': (0, 2), 'D': (3, 3)}

Test: ///D
outer 3 D => (3, 4)
{'D': (3, 4)}
{'D': [3, 3]}
{'D': (3, 3)}

撰写回答