Python:在同一个TextFi中进行多个搜索

2024-04-25 10:59:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个巨大的文本文件,数据如下:

Name : ABC  
Bank : Bank1    
Account-No : 01234567    
Amount: 123456    
Spouse : CDF    
Name : ABD    
Bank : Bank1    
Account-No : 01234568    
Amount: 12345    
Spouse : BDF    
Name : ABE    
Bank : Bank2    
Account-No : 01234569    
Amount: 12344    
Spouse : CDG    
.
.
.
.
.

我需要获取Account-No和{},然后将它们写入新文件

^{pr2}$

我试图通过mmap来搜索文本文件以获得帐号的位置,但是我不能 让下一个账号通过这个。在

import mmap
fname = input("Enter the file name")
f1 = open(fname)

s  = mmap.mmap(f1.fileno(),0,access=mmap.ACCESS_READ)
if s.find(b'Account-No') != -1:
    r = s.find(b'Account-No')
f1.close()

在'r'中,我有帐号的第一个位置,但我无法从(r+1)搜索以获得 下一个账号

我可以把这个放在循环中,但是mmap的确切语法对我来说不起作用。在

在这方面,有谁能通过mmap或其他方法来帮助我吗。在


Tags: 数据nonameaccountfindamountfnamef1
3条回答

大型文件解决方案:

下面是一个工作示例,您可以通过在“必需字段”列表中添加或删除字段名来轻松自定义。 此解决方案允许您处理大量文件,因为整个文件不会同时读入内存。在

import tempfile

# reproduce your input file
# for the purpose of having a
# working example
input_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_orig:
    input_filename = f_orig.name
    f_orig.write("""Name : ABC

Bank : Bank1

Account-No : 01234567

Amout: 123456

Spouse : CDF

Name : ABD

Bank : Bank1

Account-No : 01234568

Amout: 12345

Spouse : BDF

Name : ABE

Bank : Bank2

Account-No : 01234569

Amout: 12344

Spouse : CDG""")
    # start looking from the beginning of the file again
    f_orig.seek(0)

    # list the fields you want to keep
    required_fields = [
        'Account-No',
        'Amout',
    ]

    # filter and write, line by line
    result_filename = None
    with tempfile.NamedTemporaryFile(delete=False) as f_result:
        result_filename = f_result.name
        # process one line at a time (memory efficient)
        while True:
            line = f_orig.readline()
            #check if we have reached the end of the file
            if not line:
                break
            for field_name in required_fields:
                # write fields of interest to new file
                if field_name in line:
                    f_result.write(line)
                    f_result.write('\n') # just for formatting


    # show result
    with open(result_filename, 'r') as f:
        print(f.read())

其结果是:

^{pr2}$

使用pandas,我们可以执行以下操作:

import pandas as pd

rowsOfLines = pd.read_table('my_file.txt', header=None)

with open('output_file.txt', 'w+') as file:
    for index, row in rowsOfLines.iterrows():
        splitLine = row.str.split()[0]
        if 'Account-No' in splitLine:
            file.write('{} \n'.format(row.to_string(index=False)))
        elif 'Amount:' in splitLine:
            file.write('{} \n'.format(row.to_string(index=False)))

代码:

listOfAllAccountsAndAmounts = [] # list to save all the account and lists
searchTexts = ['Account-No','Amout'] # what all you want to search

with open('a.txt', 'r') as inFile:
    allLines = inFile.readlines() # read all the lines
    # save all the indexes of those that have any of the words from the searchTexts list in them
    indexOfAccounts = [ i for i, line in enumerate(allLines) if any( x in line for x in searchTexts) ] 

    for index in indexOfAccounts:
        listOfAllAccountsAndAmounts.append(allLines[index][:-1].split(': '))

print(listOfAllAccountsAndAmounts)

输出:

^{pr2}$

如果不想拆分并按原样保存:

listOfAllAccountsAndAmounts.append(allLines[index])

输出:

['Account-No : 01234567\n', 'Amout: 123456\n', 'Account-No : 01234568\n', 'Amout: 12345\n', 'Account-No : 01234569\n', 'Amout: 12344\n']

我已经写了一份单子,以防你想处理这些信息。您也可以直接将字符串写入新文件,甚至不使用@Arda所示的列表。在

相关问题 更多 >

    热门问题