PDF页面存储为什么字节

2024-05-23 15:19:13 发布

您现在位置：Python中文网/ 问答频道 /正文

2635

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在努力写一些工作脚本，我很难研究一个特定的问题。我假设每个PDF页面都是一个图像，比如jpg，但是即使我正在读取文件，但它并不是这样。所以我的问题是：什么是存储的PDF页面，好像不是图像？

下面是我正在工作的代码：

    pdf = user_file.file.read()
    startmark = b"\xff\xd8"
    startfix = 0
    endmark = b"\xff\xd9"
    endfix = 2
    i = 0

    njpg = 0
    while True:
        istream = pdf.find("stream", i)
        if istream < 0:
            break
        istart = pdf.find(startmark, istream, istream+20)
        if istart < 0:
            i = istream+20
            continue
        iend = pdf.find("endstream", istart)
        if iend < 0:
            raise Exception("Didn't find end of stream!")
        iend = pdf.find(endmark, iend-20)
        if iend < 0:
            raise Exception("Didn't find end of JPG!")

        istart += startfix
        iend += endfix
        print "JPG %d from %d to %d" % (njpg, istart, iend)

Tags：图像 if pdf 页面 find file xff istream

1条回答

网友

1楼 · 发布于 2024-05-23 15:19:13

我认为PDF应该以字节的形式存储。我在解析PDF时使用了一个名为pypdf的库。在

PDF页面存储为什么字节

相关问题更多 >

编程相关推荐

热门问题

热门文章

PDF页面存储为什么字节

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >