Python遍历列表中的页面和项目，输出不按顺序

2024-05-29 04:00:49 发布

您现在位置：Python中文网/ 问答频道 /正文

698

网友

男 | 程序猿一只，喜欢编程写python代码。

我的代码遍历PDF的每一页，然后给出所有页面的关键字。我希望代码在找到后停止迭代页面，然后再次从第1页开始搜索第二个关键字，找到后停止，继续搜索下一个关键字

有人能帮我吗

示例输入文本

第1页：今天天气很好。目前的制度是如此多样化。可持续性很重要。系统在公司中存在

第2页：我们对此不确定。生物特征数据就在那里。技术是最好的。技术很重要

第3页：今天是星期一。有银首饰。理想的数据库就是这么多年的历史。银色很好

代码如下：


import fitz #(Python PyMuPDF library)
import pandas as pd

keywords= ['systems','biometric','technology','silver','puppies']
filename = r"myfile"
doc =fitz.open(filename)
page=doc[0]
lst=[]

#open page in PDF
for page in doc:
    text = page.getText("text")
    data = ''.join(text)
    data=str(data)
    # add a full stop where there are short sentence
    for line in data.split('\n'):
        if 4 <= len(line) <= 20:
            line=line+'.'
           #iterate through keywords list 
        for item in keywords:
            #if present then print
            if item in line:
                lst.append((line.split('.')))
                print('\nKEYWORD:{} \n OUTPUT \n'.format(item),line, page number)

                break
#else if not found in whole document then print not found
else:
    lst.append('Not found')
    print('not found')

电流输出：

systems are so varied currently 1
systems are present in the company 1
biometric data is there 2
technology is the best 2 
technology is important 2
silver jewellery is present 3
silver is nice 3

期望输出

systems are so varied currently 1
biometric data is there 2
technology is the best 2
silver jewellery is present 3
puppies: 'not found '

Tags：代码 in data silver if is line page

0条回答

目前没有回答

Python遍历列表中的页面和项目，输出不按顺序

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python遍历列表中的页面和项目，输出不按顺序

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >