如何用提取的文本重命名PDF文件?

1 投票
3 回答
7866 浏览
提问于 2025-04-18 14:04

我正在尝试用Python来根据PDF文件中的部分内容重命名文件。情况是这样的:

这个PDF文件是一个商业发票,里面有“商业发票”和“部门”这几个字。我想把文件重命名为“商业发票”和“部门”,比如“353624 HR”。

这是我目前的代码:

from StringIO import StringIO
import pyPdf
import os

# a function here
def getPDFContent(path):
    content = ""
    num_pages = 10
    p = file(path, "rb")
    pdf = pyPdf.PdfFileReader(p)
    for i in range(0, num_pages):
        content += pdf.getPage(i).extractText() + "\n"
        content = " ".join(content.replace(u"\xa0", " ").strip().split())     
        return content 

# name of the source PDF file
PDF_name = '222'

# picking texts from the PDF file
pdfContent = StringIO(getPDFContent("C:\\" + PDF_name + ".pdf").encode("ascii", "ignore"))
for line in pdfContent:
    aaa = line.find(' Commercial Invoice ')
    CIN = line[aaa + 28: aaa + 38]
    bbb = line.find('Department')
    Dpt = line [bbb+20 : bbb+26]

    final_name = str(CIN + " " + Dpt)
    
print final_name

f = open("C:\\" + PDF_name + ".pdf")
f.close()

os.rename("C:\\" + PDF_name + ".pdf", "C:\\" + final_name + ".pdf")

代码运行到提取文本的部分是没问题的,'print final_name'可以正常输出,但在最后重命名文件的时候,出现了一个错误:“WindowsError: [Error 32] 该进程无法访问文件,因为它正在被另一个进程使用”。

这里出了什么问题呢?看起来文件之前没有正确关闭?

3 个回答

-1

这可以通过鼠标事件和光标位置来实现。下面是代码:

Sub Run_report1()

'
' Run_report Macro
'
' Keyboard Shortcut: Ctrl+Shift+G
'
Application.Wait Now + TimeValue("0:00:01")
SendKeys "%{Tab}", True
Application.Wait Now + TimeValue("0:00:01")


Dim i As Integer
i = 1
Do Until i > 8
Application.Wait Now + TimeValue("0:00:01")


SetCursorPos 309, 253

mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0

Application.Wait Now + TimeValue("0:00:01")

SendKeys "{Enter}", True


Application.Wait Now + TimeValue("0:00:03")

SetCursorPos 794, 771

mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0

Application.Wait Now + TimeValue("0:00:01")


SetCursorPos 1068, 728

mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0

Application.Wait Now + TimeValue("0:00:01")

SetCursorPos 746, 94

mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0

Application.Wait Now + TimeValue("0:00:01")


SendKeys "%{Tab}", True

Application.Wait Now + TimeValue("0:00:01")

SetCursorPos 309, 253

mouse_event MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0
mouse_event MOUSEEVENTF_LEFTUP, 0, 0, 0, 0

Application.Wait Now + TimeValue("0:00:01")

SendKeys "^V", True
SendKeys "{Enter}", True

Application.Wait Now + TimeValue("0:00:01")

SendKeys "{F5}", True

SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True
SendKeys "{PGUP}", True

i = i + 1
Loop

MsgBox "Task Completed"

End Sub
-1

在最后一行的PDF_name里,也加上C:\\

1

def getPDFContent(path) 这个函数里,

在执行 p = file(path, "rb") 之后,当内容被复制完后,

你需要关闭这个文件。

p.close()

把这个关闭文件的操作放在循环之后,但要放在这个函数里。

撰写回答