python用户从从从docx转换的html文档中检索脚注（docx2python不工作）

2024-06-01 00:30:06 发布

您现在位置：Python中文网/ 问答频道 /正文

4868

网友

男 | 程序猿一只，喜欢编程写python代码。

我需要一些帮助从python中的docx文档检索脚注，因为docx文件包含大量脚注

下面是我目前遇到问题的代码，因为docx2python无法读取超过一定数量的页面的word文档

from docx2python import docx2python


docx_temp = docx2python(filepath)
footnotes = docx_temp.footnotes
footnotes = footnotes[0][0][0]
footnotes = [i.replace("\t","") for i in footnotes]

因此，我尝试了下面的其他方法，但由于不熟悉XML，我无法确定代码是否正常工作，因此我陷入了困境：

import re
import mammoth


with open(filepath, 'rb') as file:
    html = mammoth.convert_to_html(file).value
    #html = re.sub('\"(.+?)\"', '"<em>\1</em>"', html)
    fnotes = re.findall('id="footnote-<number>" (.*?) ', html)

及

import re
import zipfile
import xml.etree.ElementTree
from docx2python import docx2python


docxfile = zipfile.ZipFile(open(filepath,'rb'))
xmlString = docxfile.read('word/footnotes.xml').decode('utf-8')
fn = docxfile.read('word/footnotes.xml')
xml.etree.ElementTree.parse(fn)

你们能告诉我如何正确地编写代码从docx/HTML文件中提取脚注吗。谢谢你的帮助

Tags：文件代码 from 文档 import re html xml

0条回答

目前没有回答

python用户从从从docx转换的html文档中检索脚注（docx2python不工作）

相关问题更多 >

编程相关推荐

热门问题

热门文章

python用户从从从docx转换的html文档中检索脚注（docx2python不工作）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >