如何匹配两个列表并只更改每对中的第二个？

// on chapter.xhtml Footnote 1 <a id="fn1" href="../Text/chapter.xhtml#rfn1">[1]</a> Footnote 2 <a id="fn2" href="../Text/chapter.xhtml#rfn2">[2]</a> 1. <a id="rfn1" href="../Text/chapter.xhtml#fn1">1.</a> Footnote 1 2. <a id="rfn2" href="../Text/chapter.xhtml#fn2">2.</a> Footnote 2

Footnote 1 <a id="fn1" href="../Text/chapter.xhtml#rfn1">[1]</a> Footnote 2 <a id="fn2" href="../Text/chapter.xhtml#rfn2">[2]</a> 1. <a id="rfn1" href="../Text/chapter.xhtml#fn1" role="doc-backlink">1.</a> Footnote 1 2. <a id="rfn2" href="../Text/chapter.xhtml#fn2" role="doc-backlink">2.</a> Footnote 2

1条回答

网友

1楼 · 发布于 2024-06-16 18:13:02

假设：脚注总是排在第二位。你知道吗

我们将遍历页面中的所有链接，尝试查看每个链接的href属性中是否包含片段标识符。如果有，我们将使用它来获取匹配的链接。你知道吗

我们将使用find_next而不是find，因为后者将从文档中的任何位置获取匹配的标记，而find_next将只尝试从正在处理的对象的位置查找。我用一个例子来说明：

some_link['href']
# ../Text/chapter.xhtml#rfn1

some_link.find('a', {'id': 'rfn1'})
# <a id="rfn1" href="../Text/chapter.xhtml#fn1" role="doc-backlink">1.</a>

如果我们使用find，我们无法确定找到的链接是出现在原始链接之前还是之后。但是，如果我们使用find_next。。。你知道吗

footnote_link = some_link.find_next('a', {'id': 'rfn1'})
footnote_link
# <a id="rfn1" href="../Text/chapter.xhtml#fn1" role="doc-backlink">1.</a>

footnote_link.find_next('a', {'id': 'fn1'})
# None

。。。我们可以确定这个链接出现在第二个位置（因此是脚注），因为find_next如果找不到匹配项，它将返回None，从我们调用find_next的对象的位置开始。你知道吗

下面是完整代码可能的样子：

for link in soup.find_all('a'):
    try:
        fragment_id = link['href'].rsplit('#', maxsplit=1)[1]
    except IndexError:
        # the `rsplit` returned only one string, meaning '#' wasn't found in the string
        continue

    footnote = link.find_next('a', {'id': fragment_id})
    if footnote:
        # a matching footnote has been found
        # you can add attributes to it by modifying `footnote`

相关问题更多 >

编程相关推荐

热门问题

热门文章