从python创建pdf

2024-04-28 12:11:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望从Python应用程序生成PDF。 它们开始时相对简单,但有些可能会变得更复杂(基本上是类似字母的文档,但稍后会包括水印)

我以前在原始postscript中工作过,如果我能生成正确的头文件等,我希望避免使用复杂的lib,因为它们可能无法完全满足我的需要。一些似乎已经被bitrot和不再支持(pypdf和pypdf2),尤其是当我知道PDF/Postscript可以完全满足我的需要时。PDF内容其实没有那么复杂。在

我可以生成EPS(封装的postscript)只需编写适当的文本头文件和我的postscript代码。但是检查PDF有一个lil二进制头,我不知道如何生成。在

我可以生成一个EPS并转换它。我对此不太满意,因为生产环境是windows2008服务器(Dev是ubuntu12.04),并且制作和转换它看起来非常愚蠢。在

以前有人这样做过吗? 我是不是因为不想使用图书馆而变得迂腐?在


Tags: 文档文本应用程序内容pdf头文件lib字母
2条回答

borrowed from ask.yahoo

A PDF file starts with "%PDF-1.1" if it is a version 1.1 type of PDF file. You can read PDF files ok when they don't have binary data objects stored in them, and you could even make one using Notepad if you didn't need to store a binary object like a Paint bitmap in it.

But after seeing the "%PDF-1.1" you ignore what's after that (Adobe Reader does, too) and go straight to the end of the file to where there is a line that says "%%EOF". That's always the last thing in the file; and if that's there you know that just a few characters before that place in the file there's the word "startxref" followed by a number. This number tells a reader program where to look in the file to find the start of the list of items describing the structure of the file. These items in the list can be page objects, dictionary objects, or stream objects (like the binary data of a bitmap), and each one has "obj" and "endobj" marking out where its description starts and ends.

For fairly simple PDF files, you might be able to type the text in just like you did with Notepad to make a working PDF file that Adobe Reader and other PDF viewer programs could read and display correctly.

Doing something like this is a challenge, even for a simple file, and you'd really have to know what you're doing to get any binary data into the file where it's supposed to go; but for character data, you'd just be able to type it in. And all of the commands used in the PDF are in the form of strings that you could type in. The hardest part is calculating those numbers that give the file offsets for items in the file (such as the number following "startxref").

If the way the file format is laid out intrigues you, go ahead and read the PDF manual, which tells the whole story. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf

但实际上你应该使用一个库

感谢@LukasGraf提供了这个链接http://www.gnupdf.org/Introduction_to_PDF,它展示了如何从头开始创建一个简单的helloworld pdf

只要您使用Python2.7,Reportlab似乎是目前最好的解决方案。它的功能相当全面,而且可能有点复杂,具体取决于您正在使用它做什么,但是由于您似乎熟悉PDF内部结构,所以希望学习曲线不会太陡。在

相关问题 更多 >