当我运行下面的代码将pdf图像转换为jpg策略时,错误弹出

2022-12-01 05:04:58 发布

您现在位置:Python中文网/ 问答频道 /正文

import os
import io
from PIL import Image
import pytesseract
from wand.image import Image as wi
import gc

pdfim=wi(filename="salem-father.pdf",resolution=300)

PolicyError: not authorized `salem-father.pdf' @ error/constitute.c/ReadImage/412


Tags: fromioimageimportpilpdfosaswandgcwipytesseractfathersalem
1条回答
网友
1楼 · 发布于 2022-12-01 05:04:58

您可以使用以下代码将pdf文件中的图像转换或提取为jpg或其格式

requirements.txt:

PyMuPDF==1.16.5
python-dateutil==2.8.0
pytz==2019.3
six==1.12.0

代码:

import fitz
import random, string

doc = "mypdf.pdf" # path to pdf file
doc = fitz.open(doc)
pno = doc.loadPage(4) # enter the page
text = pno.getText('dict')# dict format of the file
blocks = text["blocks"]
imgblocks = [b for b in blocks if b["type"] == 1]

x = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + 
string.digits) for _ in range(16))

if imgblocks:
   for index, img in enumerate(imgblocks):
       img_name1 = "%s-%s.%s" % (x, index, img['ext']) # png
       img_name2 = "%s-%s.jpg" % (x, index) # jpg
       with open(img_name1, 'wb') as f:
            f.write(img['image'])

       with open(img_name2, 'wb') as f:
            f.write(img['image'])