如何解析这个文本文件？问题的回答

如何解析这个文本文件？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有<a href="https://firebasestorage.googleapis.com/v0/b/honeybox-catalogue.appspot.com/o/INV.TXT?alt=media&token=e4c84a24-32de-485a-9943-dea3db5aaa14" rel="nofollow noreferrer">this text file</a> <pre class="lang-none prettyprint-override"><code> VENDOR ITEM NUMBER WH ITEM DESCRIPTION PRODUCT NUMBER PRICE DISC % MAIN-WH ALT-WH BIN# --------------- ----- -- ------------------------------ --------------- --------- ------ --------- --------- ------ 0.00 EA 0.00 0 0 10.5PLC/TLED/26V/27K 14.5W 4PIN CFL REPL 2700K VERT 458406 20.00 EA 0.00 0 0 I68 I68 (00029 ) 10.5PLC/TLED/26V/30K 14.5W 4PIN CFL REPL 3000K VERT 458414 20.00 EA 0.00 3 0 PAYOFF I68 I68 (00029 ) 10.5PLC/TLED/26V/35K 14.5W 4PIN CFL REPL 3500K VERT 458422 20.00 EA 0.00 0 0 I68 I68 (00029 ) 10.5PLC/TLED/26V/40K 14.5W 4PIN CFL REPL 4000K VERT 458430 20.00 EA 0.00 0 0 I68 I68 (00029 ) </code></pre> 我想阅读每一行项目，并获得项目编号，说明，供应商产品编号和价格。你知道吗 我试着用这个python代码 <pre><code>def readInventoryFile(): # dataFile = open("inventoryFiles/INV.txt","r") with open('inventoryFiles/INV.txt') as dataFile: for lineItem in dataFile: itemProperties = lineItem.split(" ") while("" in itemProperties) : itemProperties.remove("") print(itemProperties) try: itemNum = itemProperties[0] itemDesc = itemProperties[1] partNumb = itemProperties[2] price = itemProperties[3] itemSummry = { "Name": itemDesc, "Price": price, "PN": partNumb, } print(lineItem, "\n ",itemProperties,"\n Summary ",itemSummry) except Exception as e: print(e) </code></pre> 代码部分工作，但很难按空格或其他因素分割行，因为每行内容中都有分隔的空格。如何获得所需的产品性能？你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我想我的<a href="https://stackoverflow.com/a/4915359/355230">answer</a>对问题<a href="https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files">How to efficiently parse fixed width files?</a>的回答可以调整为做你想做的事。你知道吗 对该答案中的代码的主要修改是使其也去掉每个字段中的任何前导和尾随空格。下面是说明这一点的Python 3.x代码： <pre><code>from __future__ import print_function import struct HEADER_LINES = 5 # Indices 0 1 2 3 4 5 6 7 fieldwidths = (20, -5, 37, -10, 12, -1, 6, -1, 9, -1, 9, -1, 10, -1, 7) # Convert fieldwidths into a format compatible with struct module. fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's') for fw in fieldwidths) fieldstruct = struct.Struct(fmtstring) #print('fmtstring: {!r}, recsize: {} chars\n'.format(fmtstring, fieldstruct.size)) unpack_from = fieldstruct.unpack_from # To optimize calls. def parse(line): """ Return unpacked fields in string line, stripped of any leading and trailing whitespace. """ return list(s.decode().strip() for s in unpack_from(line.encode())) def readInventoryFile(filename): with open(filename) as invfile: for _ in range(HEADER_LINES): next(invfile) # Skip header lines. for line in invfile: if len(line) < fieldstruct.size: # Pad line if it's too short. line = line + (' ' * (fieldstruct.size-len(line))) fields = parse(line) if fields[0]: # First field non-blank? print(fields) readInventoryFile('inventoryFiles_INV.txt') </code></pre> 结果： <pre class="lang-none prettyprint-override"><code>['10.5PLC/TLED/26V/27K', '14.5W 4PIN CFL REPL 2700K VERT 458406', '20.00 EA', '0.00', '0', '0', 'I68', 'I68'] ['10.5PLC/TLED/26V/30K', '14.5W 4PIN CFL REPL 3000K VERT 458414', '20.00 EA', '0.00', '3', '0', 'PAYOFF I68', 'I68'] ['10.5PLC/TLED/26V/35K', '14.5W 4PIN CFL REPL 3500K VERT 458422', '20.00 EA', '0.00', '0', '0', 'I68', 'I68'] ['10.5PLC/TLED/26V/40K', '14.5W 4PIN CFL REPL 4000K VERT 458430', '20.00 EA', '0.00', '0', '0', 'I68', 'I68'] ['1000PAR64/FFR', '1000W PAR64 HALOGEN GX16D BASE 56217', '50.00 EA', '0.00', '0', '0', 'I10', ''] ['1000PAR64/WFL/S', '1000W PAR64 HALOGEN GX16D BASE S4673', '0.00 EA', '0.00', '0', '0', '', 'I105'] ['100A/99', '100W A19 EXTENDED SERVICE 229781', '2.62 EA', '0.00', '0', '0', 'W6-2 I70', 'I11'] ['100A/CL', '100W A19 130V CLEAR 375279', '0.99 EA', '0.00', '0', '0', 'A2-2 I70', 'I11'] </code></pre> <h3>工作原理</h3> 简而言之，这段代码利用Python的<a href="https://docs.python.org/3/library/struct.html#module-struct" rel="nofollow noreferrer">^{<cd1>}</a>模块功能，将充满数据的“缓冲区”拆分或“解包”为固定的“字段”，每个字段包含一定数量的字符。你知道吗 虽然更常用于二进制数据，但它也适用于已转换为字节数组的字符串（在Python2.x中不需要）。基本上你给它一个<a href="https://docs.python.org/3/library/struct.html#format-strings" rel="nofollow noreferrer">format string</a>来指定每个字段的特征（类型和大小），以及要解析的数据（本例中是文件中的一行），然后它相应地解压并返回结果作为一个值列表。你知道吗

如何解析这个文本文件？

1 个回答

相关Python问题