将列表中的项目与excel电子表格进行比较,然后从spreadsh中提取比较

2024-05-19 06:25:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是python3.7。如果我的代码有点乱,我会提前道歉。这是我从事过的第一个项目,所以我在学习的过程中学到了很多东西

我正在尝试创建一个程序,该程序扫描和解析PDF中的特定表达式(使用Regex),然后将这些结果与excel电子表格中包含的数据进行比较和标识

目前,该程序成功地从PDF中提取正确的信息,并与excel中的B列进行比较,以确认数据存在且正确无误

我想要它做的是打印B列中某个特定单元格的数据C列中它旁边的单元格

这是我当前的代码:

# Open file dialog
root = tk.Tk()
root.withdraw()

file_path = filedialog.askopenfilename()

# Open DOC and extract text
pdfFile = open(file_path, 'rb')
reader = PyPDF2.PdfFileReader(pdfFile)

pageNum = str(reader.numPages)
print('Your document has ' + pageNum + ' pages' + '\n')

for pN in range(reader.numPages):
    decCon = reader.getPage(pN).extractText()

#print(decCon) #to test if extracting worked.


# find the harmonised standards
# EN 000 000-1 V0.0.0, EN000000-1V0.0.0, EN 00000:0000, EN 00000:0000
docRegex = re.compile('''
EN\s\d\d\d\s\d\d\d-\d\sV\d.\d.\d|

EN\s\d\d\d\d\d:\d\d\d\d|

EN\s\d\d\d\d\d-\d:\d\d\d\d
''', re.VERBOSE)

# extract the harmonised standards
extractedHs = docRegex.findall(decCon)

# DEBUG - to ensure it is collecting correct data
print('It contains the following standards: ' + '\n')
pprint.pprint(extractedHs)
print('\n' + '\n')

# setup progress bar
print('Scanning all ETSI standards...') 
toolbar_width = 10
sys.stdout.write("-" * toolbar_width)

for i in range(toolbar_width):
    time.sleep(0.25)
    sys.stdout.write("-")
    sys.stdout.flush()

sys.stdout.write('\n' + '\n' + 'Printing results now...' + "\n" + '\n')


# extract from etsi spreadsheet
wb = openpyxl.load_workbook('All About Standards.xlsx')
sheet = wb["ETSI Catalog"]

etsi = []
for col in sheet['B']:
    etsi.append(col.value)

#print(etsi) # DEBUG PRINT
extractedEtsi = docRegex.findall(str(etsi))

# comparison code
for item1 in extractedHs:
    for item2 in extractedEtsi:
        if item1 == item2:
            print('Standard found: ' + item2)

抱歉,如果我的解释有点冗长,我会尝试进一步解释或简化,如果需要

提前谢谢


Tags: the数据in程序forstdoutsysextract

热门问题