PythonBeautiful Soup无法解析整个无序的lis

2024-05-18 23:33:18 发布

您现在位置：Python中文网/ 问答频道 /正文

2245

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试刮一个网站，有一个部分，只是让我困惑。有一个由组织提供服务的无序位置列表，我似乎可以解析整个列表。在

下面是一个HTML外观的示例：

<div id="current_tab">

                <p class="view_label_type_geoserved" id="view_label_field_geoserved">Geographies Served</p>
                <ul>
                    <li class="view_type_geoserved" id="view_field_geoserved">
                        <p style="font-weight: bold; border-bottom: 1px dotted #CCC; font-size: .9em;">North Carolina (NC)<span style="float: right; font-size: 0.8em;">North Carolina (NC)</span></p>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Durham (serves entire county)<span style="float: right; font-size: 0.8em;">Durham</span></p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Franklin (serves entire county)<span style="float: right; font-size: 0.8em;">Franklin</span></p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Granville (serves entire county)<span style="float: right; font-size: 0.8em;">Granville</span>
                        </p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Orange (serves entire county)<span style="float: right; font-size: 0.8em;">Orange</span></p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Person (serves entire county)<span style="float: right; font-size: 0.8em;">Person</span></p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Vance (serves entire county)<span style="float: right; font-size: 0.8em;">Vance</span></p>
                    </li>
                        <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Wake (serves entire county)<span style="float: right; font-size: 0.8em;">Wake</span></p>
                    </li>
                    <p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Warren (serves entire county)<span style="float: right; font-size: 0.8em;">Warren</span></p>
                    </li>
            </ul>            
</div>

这里是我用来解析元素的

^{pr2}$

下面是我得到的结果，注意这只是列表的开始：

<p class="view_label_type_geoserved" id="view_label_field_geoserved">Geographies Served</p>
<p style="font-weight: bold; border-bottom: 1px dotted #CCC; font-size: .9em;">North Carolina (NC)<span style="float: right; font-size: 0.8em;">North Carolina (NC)</span></p>
<p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Durham (serves entire county)<span style="float: right; font-size: 0.8em;">Durham</span></p>
<p style="margin: 5px 0 3px 8px; border-bottom: 1px dotted #DDD; font-size:1em">Franklin (serves entire county)<span style="float: right; font-size: 0.8em;">Franklin</span></p>

一旦我得到了HTML，我有一些函数，将使用regex剥离文本，然后将它们连接成一个字符串，但建议也会很感激。在

Tags： margin right size style float span em font

1条回答

网友

1楼 · 发布于 2024-05-18 23:33:18

问题是您正在处理的HTML需要一个宽松的解析器来解析。在

使用lxml，或html5lib：

soup = BeautifulSoup(data, 'html5lib')  # or BeautifulSoup(data, 'lxml')
for p in soup.select('div#current_tab p'):
    print p.text

对我有用，它可以打印：

^{pr2}$

PythonBeautiful Soup无法解析整个无序的lis

相关问题更多 >

编程相关推荐

热门问题

热门文章

PythonBeautiful Soup无法解析整个无序的lis

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >