擅长:python、mysql、java
<p>我刚刚以一种更简单的方式解决了这个问题,添加了<code>*</code>来指定目录中的所有子目录:</p>
<pre><code>import pytesseract
from pdf2image import convert_from_path
import glob
pdfs = glob.glob(r"K:\pdf_files\*\*.pdf")
for pdf_path in pdfs:
pages = convert_from_path(pdf_path, 500)
for pageNum,imgBlob in enumerate(pages):
text = pytesseract.image_to_string(imgBlob,lang='eng')
with open(f'{pdf_path}.txt', 'a') as the_file:
the_file.write(text)
</code></pre>