如何使用python/beautifulsoup从html中获取项目符号编号?

2024-04-25 08:37:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从如下网页中获取嵌套的项目符号编号(1.1、1.2、a、b等):

1定义
1.1 Abcdef.
1.2 Ghijkl:
A.abcd
Befgh
C等等

html代码如下所示:

<ol>
    <li>
        <h3 style="padding-left: 43pt;text-indent: -36pt;text-align: justify;">Definition</h3>
        <p style="text-indent: 0pt;text-align: left;"><br /></p>
        <ol id="l3">
            <li>
                <p style="padding-left: 43pt;text-indent: -36pt;text-align: justify;">In this Notice:</p>
                <p style="text-indent: 0pt;text-align: left;"><br /></p>
                <ol id="l4">
                    <li>
                        <p style="padding-left: 7pt;text-indent: 0pt;text-align: justify;">Code of conduct.</p>
                        <p style="text-indent: 0pt;text-align: left;"><br /></p>
                    </li>
                    <li>
                        <p style="padding-left: 7pt;text-indent: 0pt;text-align: justify;">The following:</p>
                        <p style="text-indent: 0pt;text-align: left;"><br /></p>
                        <ol id="l5">
                            <li>
                                <p style="padding-left: 79pt;text-indent: -36pt;line-height: 16pt;text-align: left;">trains</p>
                            </li>
                            <li>
                                <p style="padding-left: 79pt;text-indent: -36pt;text-align: left;">Buses</p>
                            </li>
                            <li>
                                <p style="padding-left: 79pt;text-indent: -36pt;line-height: 16pt;text-align: left;">PLanes</p>

我试着到处找,但找不到一个优雅的解决办法。Tks

--更新--

添加css部分,该部分具有递增项目符号的计数器功能。我有一种预感,可能的解决方案应该使用这个函数。谢谢

<style type="text/css">
#l1 {
            padding-left: 0pt;
            counter-reset: c1 1;
        }

        #l1>li>*:first-child:before {
            counter-increment: c1;
            content: counter(c1, decimal)" ";
            color: black;
            font-family: "Times New Roman", serif;
            font-style: normal;
            font-weight: bold;
            text-decoration: none;
            font-size: 14pt;
        }

        #l1>li:first-child>*:first-child:before {
            counter-increment: c1 0;
        }

        #l2 {
            padding-left: 0pt;
            counter-reset: c2 1;
        }

        #l2>li>*:first-child:before {
            counter-increment: c2;
            content: counter(c1, decimal)"."counter(c2, decimal)" ";
            color: black;
            font-family: "Times New Roman", serif;
            font-style: normal;
            font-weight: normal;
            text-decoration: none;
            font-size: 14pt;
        }

        #l2>li:first-child>*:first-child:before {
            counter-increment: c2 0;
        }
</style>

Tags: textbrchildstylecounterlileftfirst