获取类型错误：ord（）需要长度为1的字符串，但找到int

2024-05-23 21:21:29 发布

男 | 程序猿一只，喜欢编程写python代码。

代码是

from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf','rb') as file:
    pdf=PdfFileReader(file)
    pagedd=pdf.getPage(0)
    print(pagedd.extractText())

此代码引发如下所示的错误：

^{pr2}$

我在网上搜索发现了这个Troubleshooting "TypeError: ord() expected string of length 1, but int found" 但这帮不了什么忙。我知道这个错误的背景是什么，但不确定它与这里有什么关系？在

尝试改变pdf文件，它工作得很好。那么问题是什么：pdf文件或PyPDF2无法处理它？我知道根据文件，这种方法不太可靠：

This works well for some PDF files, but poorly for others, depending on the generator used

该如何处理？在

回溯：

Traceback (most recent call last):
  File "pdf_reader.py", line 71, in <module>
    print(pagedd.extractText())
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2595, in ex
tractText
    content = ContentStream(content, self.pdf)
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2673, in __
init__
    stream = BytesIO(b_(stream.getData()))
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\generic.py", line 841, in
 getData
    decoded._data = filters.decodeStreamData(self)
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 350, in
 decodeStreamData
    data = LZWDecode.decode(data, stream.get("/DecodeParms"))
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 255, in
 decode
    return LZWDecode.decoder(data).decode()
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 228, in
 decode
    cW = self.nextCode();
  File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 205, in
 nextCode
    nextbits=ord(self.data[self.bytepos])
TypeError: ord() expected string of length 1, but int found

Tags： in py pdf lib packages local line site

1条回答

网友

1楼 · 发布于 2024-05-23 21:21:29

我明白了。这只是PyPDF2的一个限制。我用tika和beauthoulsoup来解析和提取文本，效果很好。尽管它只需要再多做点什么。在

from tika import parser 
from bs4 import BeautifulSoup
raw=parser.from_file('HTTP_Book.pdf',xmlContent=True)['content']
data=BeautifulSoup(raw,'lxml')
message=data.find(class_='page') # for first page
print(message.text)

获取类型错误：ord（）需要长度为1的字符串，但找到int

相关问题更多 >

编程相关推荐

热门问题

热门文章

获取类型错误：ord（）需要长度为1的字符串，但找到int

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >