如何在Google Cloud函数上使用Python pdf2image模块（因此是poppler）？

import requests from pdf2image import convert_from_path def process_image(event, context): # Download sample pdf file url = 'https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf' r = requests.get(url, allow_redirects=True) open('/tmp/sample.pdf', 'wb').write(r.content) # Error occur on this line pages = convert_from_path('/tmp/sample.pdf') # Save pages to /tmp for idx, page in enumerate(pages): output_file_path = f"/tmp/{str(idx)}.jpg" page.save(output_file_path, 'JPEG') # To be saved to cloud storage

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 441, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "/opt/python3.8/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/opt/python3.8/lib/python3.8/subprocess.py", line 1706, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'pdfinfo'

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 149, in view_func function(data, context) File "/workspace/main.py", line 11, in process_image pages = convert_from_path('/tmp/sample.pdf') File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 467, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

2条回答

网友

1楼 · 编辑于 2024-04-26 18:31:21

云函数不支持安装定制的系统级软件包（即使它支持使用包管理器（如npm、pip）的相关编程语言的第三方库）。如https://cloud.google.com/functions/docs/reference/system-packages所示，没有包“poppler”

但是，您仍然可以使用其他预安装的软件包ghostscript可用于将pdf转换为图像

首先，您应该将pdf文件保存在云功能中（例如，从云存储）。您只能对/tmp进行磁盘写入访问（https://cloud.google.com/functions/docs/concepts/exec#file_system）

将pdf转换为jpeg的终端命令示例如下

gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=jpeg -dJPEGQ=100 -r300 -sOutputFile=output/file/path input/file/path

在python环境中使用该命令的示例代码：

# download the file from google cloud storage
gcs = storage.Client(project=os.environ['GCP_PROJECT'])
bucket = gcs.bucket(bucket_name)
blob = bucket.blob(file_name)
blob.download_to_filename(input_file_path)

# run ghostscript
cmd = f'gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=jpeg -dJPEGQ=100 -r300 -sOutputFile="{output_file_path}" {input_file_path}'.split(' ')
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
stdout, stderr = p.communicate()
error = stderr.decode('utf8')
if error:
    logging.error(error)
    return

注: 您可能希望改用imagemagick包，它本身使用ghostscript。但是，如Can't load PDF with Wand/ImageMagick in Google Cloud Function中所述，ImageMagick的PDF读取已被禁用，因为截至撰写本文时（2021-07-12），Ghostscript存在安全漏洞。提供的解决方案本质上是运行ghostscript的另一种方法

参考： https://www.the-swamp.info/blog/google-cloud-functions-system-packages/

网友

2楼 · 编辑于 2024-04-26 18:31:21

出现此错误是因为poppler包在云函数中不工作，因为它需要将某些文件写入系统。不幸的是，您不能在诸如云函数之类的无服务器产品中写入文件系统

您可能想尝试方法，在另一个线程中描述，Cloud Functions for Firebase - Converting PDF to image，或者考虑使用GCP计算引擎来访问整个系统。

相关问题更多 >

编程相关推荐

热门问题

热门文章