<p>我想我的<a href="https://stackoverflow.com/a/4915359/355230">answer</a>对问题<a href="https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files">How to efficiently parse fixed width files?</a>的回答可以调整为做你想做的事。你知道吗</p>
<p>对该答案中的代码的主要修改是使其也去掉每个字段中的任何前导和尾随空格。下面是说明这一点的Python 3.x代码:</p>
<pre><code>from __future__ import print_function
import struct
HEADER_LINES = 5
# Indices 0 1 2 3 4 5 6 7
fieldwidths = (20, -5, 37, -10, 12, -1, 6, -1, 9, -1, 9, -1, 10, -1, 7)
# Convert fieldwidths into a format compatible with struct module.
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
#print('fmtstring: {!r}, recsize: {} chars\n'.format(fmtstring, fieldstruct.size))
unpack_from = fieldstruct.unpack_from # To optimize calls.
def parse(line):
""" Return unpacked fields in string line, stripped of any leading and
trailing whitespace.
"""
return list(s.decode().strip() for s in unpack_from(line.encode()))
def readInventoryFile(filename):
with open(filename) as invfile:
for _ in range(HEADER_LINES):
next(invfile) # Skip header lines.
for line in invfile:
if len(line) < fieldstruct.size: # Pad line if it's too short.
line = line + (' ' * (fieldstruct.size-len(line)))
fields = parse(line)
if fields[0]: # First field non-blank?
print(fields)
readInventoryFile('inventoryFiles_INV.txt')
</code></pre>
<p>结果:</p>
<pre class="lang-none prettyprint-override"><code>['10.5PLC/TLED/26V/27K', '14.5W 4PIN CFL REPL 2700K VERT 458406', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/30K', '14.5W 4PIN CFL REPL 3000K VERT 458414', '20.00 EA', '0.00', '3', '0', 'PAYOFF I68', 'I68']
['10.5PLC/TLED/26V/35K', '14.5W 4PIN CFL REPL 3500K VERT 458422', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/40K', '14.5W 4PIN CFL REPL 4000K VERT 458430', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['1000PAR64/FFR', '1000W PAR64 HALOGEN GX16D BASE 56217', '50.00 EA', '0.00', '0', '0', 'I10', '']
['1000PAR64/WFL/S', '1000W PAR64 HALOGEN GX16D BASE S4673', '0.00 EA', '0.00', '0', '0', '', 'I105']
['100A/99', '100W A19 EXTENDED SERVICE 229781', '2.62 EA', '0.00', '0', '0', 'W6-2 I70', 'I11']
['100A/CL', '100W A19 130V CLEAR 375279', '0.99 EA', '0.00', '0', '0', 'A2-2 I70', 'I11']
</code></pre>
<h3>工作原理</h3>
<p>简而言之,这段代码利用Python的<a href="https://docs.python.org/3/library/struct.html#module-struct" rel="nofollow noreferrer">^{<cd1>}</a>模块功能,将充满数据的“缓冲区”拆分或“解包”为固定的“字段”,每个字段包含一定数量的字符。你知道吗</p>
<p>虽然更常用于二进制数据,但它也适用于已转换为字节数组的字符串(在Python2.x中不需要)。基本上你给它一个<a href="https://docs.python.org/3/library/struct.html#format-strings" rel="nofollow noreferrer">format string</a>来指定每个字段的特征(类型和大小),以及要解析的数据(本例中是文件中的一行),然后它相应地解压并返回结果作为一个值列表。你知道吗</p>