如何使用漂亮的汤从HTML中获取文本

2024-04-29 10:23:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我想知道如何从HTML中获取文本A1 Pawn

<tr id="overview-summary-current">
<th scope="row">
    <span class="edit-tools">
        <a href="#background-experience" class="edit-section" id="control_gen_4">Edit experience</a>
        <script id="controlinit-dust-server-65573249-4" type="text/javascript+initialized" class="li-control">LI.Controls.addControl("control-dust-server-65573249-4","IntraScroller",{tracking:'top-card-edit-experience',paddingTop:-20})</script>
        <script type="text/javascript">if(dust&&dust.jsControl){if(!dust.jsControl.flushControlIds){dust.jsControl.flushControlIds="";}else{dust.jsControl.flushControlIds+=",";}dust.jsControl.flushControlIds+="control-dust-server-65573249-4";}</script>
    </span>
    <a href="#background-experience" data-trk="prof-0-ovw-curr_pos">Current</a>
</th>
<td>
    <ol>
        <li>
            <span data-tracking="mcp_profile_sum" class="new-miniprofile-container /biz/miniprofile/8241336?pathWildcard=8241336" data-li-url="/biz/miniprofile/8241336?pathWildcard=8241336" data-li-getjs="https://static.licdn.com/scds/concat/common/js?h=40vfeoewuurexnhvi1o2qiknu&amp;fc=2" data-li-miniprofile-id="LI-2326069">
                <strong>
                    <a href="/company/8241336?trk=prof-0-ovw-curr_pos" dir="auto">A1 Pawn</a>
                </strong>
            </span>
        </li>
    </ol>
</td>

我尝试过使用CSS选择器和xpath来获取文本

使用CSS选择器它不工作:

^{pr2}$

使用Xpath它不起作用:

str(profilePageSource.find_element_by_xpath("//*[@id=\"overview-summary-current\"]/td/ol/li/span/strong/a").get_text().encode("utf-8"))[2:-1]

Tags: iddataserverscriptlieditcontrolclass
3条回答

你也可以通过以下方法得到结果

soup.find('a', {'dir': "auto"}).text

soup.find(id='overview-summary-current').td.a.text应该会给出结果。在

对于CSS选择器,应该使用^{}方法,而不是.find_element_by_css_selector。示例-

elems = profilePageSource.select("#overview-summary-current > td > ol > li > span > strong > a")
if elems:
    print(str(elems[0].get_text().encode("utf-8"))[2:-1]))

演示-

^{pr2}$

相关问题 更多 >