<p>正如评论中提到的,您需要的是<a href="https://docs.python.org/3/library/os.html#os.walk" rel="nofollow noreferrer">^{<cd1>}</a>,而不是<code>glob.glob</code><code>os.walk</code>递归地为您提供目录列表<code>pdf_path</code>是当前列出的父目录,<code>dirs</code>是目录/文件夹列表,<code>files</code>是该文件夹中的文件列表</p>
<p>使用<a href="https://docs.python.org/3/library/os.path.html#os.path.join" rel="nofollow noreferrer">^{<cd7>}</a>使用父文件夹和文件名形成完整路径</p>
<p>另外,与其不断地附加到txt文件,不如在“从页面到文本”循环之外创建它</p>
<pre class="lang-py prettyprint-override"><code>import os
pdfs_dir = r"K:\pdf_files"
for pdf_path, dirs, files in os.walk(pdfs_dir):
for file in files:
if not file.lower().endswith('.pdf'):
# skip non-pdf's
continue
file_path = os.path.join(pdf_path, file)
pages = convert_from_path(file_path, 500)
# change the file extension from .pdf to .txt, assumes
# just one occurrence of .pdf in the name, as the extension
with open(f'{file_path.replace(".pdf", ".txt")}', 'w') as the_file: # write mode, coz one time
for pageNum, imgBlob in enumerate(pages):
text = pytesseract.image_to_string(imgBlob,lang='eng')
the_file.write(text)
</code></pre>