导入unstructured.partition.pdf时内核崩溃

0 投票
1 回答
59 浏览
提问于 2025-04-14 17:54

我尝试了以下的导入,但我的内核总是崩溃,我该如何解决这个问题?

from unstructured.partition.pdf import partition_pdf
path = 'data/llama.pdf'
raw_pdf_elements=partition_pdf(
    filename=path,
    extract_images_in_pdf=True,
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_outpur_dir_path='images/'
)

第一行有问题,但我需要实现raw_pdf_elements这一行,然后因为tesseract的路径出现了一些问题,所以我安装了以下内容

pip install tesseract
pip install tesseract-ocr

之后我的内核就开始崩溃了。

> 00:01:01.922 [error] Disposing session as kernel process died
> ExitCode: undefined, Reason:  00:01:01.922 [info] Dispose Kernel
> process 35807. 00:01:01.945 [info] End cell 98 execution after
> -1709672459.206s, completed @ undefined, started @ 1709672459206

1 个回答

暂无回答

撰写回答