如何使用python提取文本文件中的特定段落?

2024-04-27 22:26:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须摘录从“受托人的替代”开始到“根据上述信托契约”结束的特定段落

  1. 由于该字段是重复的,因此只需要在段落中查找数据

  2. 数据可能类似于日期、文档编号等

sample.txt
Inst #: 2021
Fees: $42.00

06/24/2021 06:54:48 AM
Receipt #: 4587188

Requestor:
FINANCIAL CORPORATION OF
After recording return to: Src: MAIL

Mail Tax Statements to:

SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE

The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records -- HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.


Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.
This JNO aay of June 2021,
Financial Corporation
wy luo Rtn rae
import re
mylines = []

pattern = re.compile(r"SUBSTITUTION OF TRUSTEE", re.IGNORECASE)
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
    for line in myfile:                 
            mylines.append(line)
    for line in mylines:
        if(line == "SUBSTITUTION OF TRUSTEE "):
            print(line)
            break
        else:
            mylines.remove(line)
    
    print("my lines",mylines)

Tags: ofthetobyaslinefinancialcorporation
2条回答

这里有一个幼稚的方法来完成你想要的-

extracted_lines=[]
extract = False

for line in open("sample.txt"):

    if extract == False and "SUBSTITUTION OF TRUSTEE".lower() in line.strip().lower():
        extract = True
        
    if extract :
        extracted_lines.append(line)
        if "under said Deed of Trust".lower() in line.strip().lower():
            extract = False # or break
            
print("".join(extracted_lines))

您可以首先检查每一行的substitution of trustee子字符串的开头,找到后,将标志变量设置为True。当标志为true时,继续向mylines列表添加行。然后,一旦到达包含under said deed or trust的行,停止添加行并返回结果:

mylines = []
flag = False
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
    for line in myfile:
        if line.strip().upper().startswith("SUBSTITUTION OF TRUSTEE"):
            flag = not flag
        if flag:
            mylines.append(line)
            if "under said deed of trust" in line.strip().lower():
                break

print("".join(mylines))

this Python demo

输出:

SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE

The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records   HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.


Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.

相关问题 更多 >