如何解析这个文本文件?

2024-05-16 07:16:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有this text file

                                                    VENDOR
ITEM NUMBER           WH ITEM DESCRIPTION               PRODUCT NUMBER      PRICE    DISC %   MAIN-WH    ALT-WH  BIN#
--------------- ----- -- ------------------------------ --------------- ---------    ------ --------- --------- ------
                                                                             0.00 EA   0.00         0         0

10.5PLC/TLED/26V/27K     14.5W 4PIN CFL REPL 2700K VERT 458406              20.00 EA   0.00         0         0        I68    I68
                                                         (00029  )

10.5PLC/TLED/26V/30K     14.5W 4PIN CFL REPL 3000K VERT 458414              20.00 EA   0.00         3         0 PAYOFF I68    I68
                                                         (00029  )

10.5PLC/TLED/26V/35K     14.5W 4PIN CFL REPL 3500K VERT 458422              20.00 EA   0.00         0         0        I68    I68
                                                         (00029  )

10.5PLC/TLED/26V/40K     14.5W 4PIN CFL REPL 4000K VERT 458430              20.00 EA   0.00         0         0        I68    I68
                                                         (00029  )

我想阅读每一行项目,并获得项目编号,说明,供应商产品编号和价格。你知道吗

我试着用这个python代码

def readInventoryFile():
    # dataFile = open("inventoryFiles/INV.txt","r")
    with open('inventoryFiles/INV.txt') as dataFile:
        for lineItem in dataFile:
            itemProperties = lineItem.split("   ")
            while("" in itemProperties) :
                itemProperties.remove("")
            print(itemProperties)
            try:
                itemNum = itemProperties[0]
                itemDesc = itemProperties[1]
                partNumb = itemProperties[2]
                price = itemProperties[3]

                itemSummry = {
                    "Name": itemDesc,
                    "Price": price,
                    "PN": partNumb,
                }

                print(lineItem, "\n ",itemProperties,"\n Summary ",itemSummry)
            except Exception as e:
                print(e)

代码部分工作,但很难按空格或其他因素分割行,因为每行内容中都有分隔的空格。如何获得所需的产品性能?你知道吗


Tags: 代码numberitemreplplcprinteacfl
2条回答

由于您的文件格式不方便,我只能将此技巧视为解决方案: 您可以查看标题(功能名称)并根据它们出现的索引解析所有行,如下所示

import numpy as np
with open('/Users/Copo1/Desktop/aaa.txt') as dataFile:
    lines = dataFile.readlines()
headers = ['ITEM NUMBER','WH ITEM DESCRIPTION', 'PRODUCT NUMBER', 'PRICE']
starts = [lines[1].find(h) for h in headers]
starts.append(len(lines[0]))
headers.append(' ')
items = [[line[starts[i]:starts[i+1]] for line in lines] for i,h in zip(range(len(starts)-1), headers[:-1]) ]

这将为items列表生成以下输出(仅粘贴与“ITEM NUMBER”对应的第一个元素,其他元素也是正确的,您可以检查)。你知道吗

[['                      ',
  'ITEM NUMBER           ',
  '       -   - ',
  '                      ',
  ' \n',
  '10.5PLC/TLED/26V/27K  ',
  '                      ',
  ' \n',
  '10.5PLC/TLED/26V/30K  ',
  '                      ',
  ' \n',
  '10.5PLC/TLED/26V/35K  ',
  '                      ',
  ' \n',
  '10.5PLC/TLED/26V/40K  ',
  '                      ',
  ' \n',
  '1000PAR64/FFR         ',
  '                      ',
  ' \n',
  '1000PAR64/WFL/S       ',
  '                      ',
  ' \n',
  '100A/99               ',
  '                      ',
  ' \n',
  '100A/CL               ',
  '                      ',
  ' '],

在这之后可能还有一些额外的简单的抛光工作要做(比如删除空字符串和'\n's),但我相信你自己也能弄清楚。你知道吗

我想我的answer对问题How to efficiently parse fixed width files?的回答可以调整为做你想做的事。你知道吗

对该答案中的代码的主要修改是使其也去掉每个字段中的任何前导和尾随空格。下面是说明这一点的Python 3.x代码:

from __future__ import print_function
import struct


HEADER_LINES = 5

# Indices       0       1        2      3      4      5       6      7
fieldwidths = (20, -5, 37, -10, 12, -1, 6, -1, 9, -1, 9, -1, 10, -1, 7)

# Convert fieldwidths into a format compatible with struct module.
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                                    for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
#print('fmtstring: {!r}, recsize: {} chars\n'.format(fmtstring, fieldstruct.size))

unpack_from = fieldstruct.unpack_from  # To optimize calls.


def parse(line):
    """ Return unpacked fields in string line, stripped of any leading and
        trailing whitespace.
    """
    return list(s.decode().strip() for s in unpack_from(line.encode()))


def readInventoryFile(filename):
    with open(filename) as invfile:
        for _ in range(HEADER_LINES):
            next(invfile)  # Skip header lines.

        for line in invfile:
            if len(line) < fieldstruct.size:  # Pad line if it's too short.
                line = line + (' ' * (fieldstruct.size-len(line)))
            fields = parse(line)
            if fields[0]:  # First field non-blank?
                print(fields)

readInventoryFile('inventoryFiles_INV.txt')

结果:

['10.5PLC/TLED/26V/27K', '14.5W 4PIN CFL REPL 2700K VERT 458406', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/30K', '14.5W 4PIN CFL REPL 3000K VERT 458414', '20.00 EA', '0.00', '3', '0', 'PAYOFF I68', 'I68']
['10.5PLC/TLED/26V/35K', '14.5W 4PIN CFL REPL 3500K VERT 458422', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['10.5PLC/TLED/26V/40K', '14.5W 4PIN CFL REPL 4000K VERT 458430', '20.00 EA', '0.00', '0', '0', 'I68', 'I68']
['1000PAR64/FFR', '1000W PAR64 HALOGEN GX16D BASE 56217', '50.00 EA', '0.00', '0', '0', 'I10', '']
['1000PAR64/WFL/S', '1000W PAR64 HALOGEN GX16D BASE S4673', '0.00 EA', '0.00', '0', '0', '', 'I105']
['100A/99', '100W A19 EXTENDED SERVICE      229781', '2.62 EA', '0.00', '0', '0', 'W6-2   I70', 'I11']
['100A/CL', '100W A19 130V CLEAR            375279', '0.99 EA', '0.00', '0', '0', 'A2-2   I70', 'I11']

工作原理

简而言之,这段代码利用Python的^{}模块功能,将充满数据的“缓冲区”拆分或“解包”为固定的“字段”,每个字段包含一定数量的字符。你知道吗

虽然更常用于二进制数据,但它也适用于已转换为字节数组的字符串(在Python2.x中不需要)。基本上你给它一个format string来指定每个字段的特征(类型和大小),以及要解析的数据(本例中是文件中的一行),然后它相应地解压并返回结果作为一个值列表。你知道吗

相关问题 更多 >