如何同时将python代码应用于文件夹中的所有文件，以及如何为每个后续输出文件创建一个新名称？

网友

1楼 · 编辑于 2024-04-27 19:45:47

以下脚本解决了您的问题：

import os

sourcedir = 'pdfdir'

dl = os.listdir('pdfdir')

for f in dl:
    fs = f.split(".")
    if fs[1] == "pdf":
        path_in = os.path.join(dl,f)
        content = getPDFContent(path_in)
        encoded = content.encode("utf-8")
        path_out = os.path.join(dl,fs[0] + ".txt")
        text_file = open(path_out, 'w')
        text_file.write(encoded)
        text_file.close()

网友

2楼 · 编辑于 2024-04-27 19:45:47

对目录中所有PDF文件进行操作的一种方法是调用glob.glob()并对结果进行迭代：

import glob
for path in glob.glob('*.pdf')
    content = getPDFContent(path)
    encoded = content.encode("utf-8")
    text_file = open("Output.txt", "w")
    text_file.write(encoded)
    text_file.close()

另一种方法是允许用户指定文件：

^{pr2}$

然后用户像python foo.py *.pdf一样运行脚本。在

网友

3楼 · 编辑于 2024-04-27 19:45:47

创建一个函数来封装对每个文件的操作。在

import os.path

def parse_pdf(filename):
    "Parse a pdf into text"
    content = getPDFContent(filename)
    encoded = content.encode("utf-8")
    ## split of the pdf extension to add .txt instead.
    (root, _) = os.path.splitext(filename)
    text_file = open(root + ".txt", "w")
    text_file.write(encoded)
    text_file.close()

然后将此函数应用于文件名列表，如下所示：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何同时将python代码应用于文件夹中的所有文件，以及如何为每个后续输出文件创建一个新名称？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >