使用's'收集特定搜索的值

2024-04-26 22:18:38 发布

您现在位置:Python中文网/ 问答频道 /正文

运行我用python编写的脚本,我可以得到完美的名字。然而,在电话和地址的情况下,我得到的是“ph.”和“Email”,结果就像下面一样,而不是它的值。如何使用选择器获取“ph”和“Email”的值。你知道吗

结果是:

arkLAB Architecture Ph. Email
Conrad Gargett Ph. Email
MONDO ARCHITECTS Ph. Email

我试图得到结果的脚本:

import requests 
from lxml import html

main_url = "http://www.findanarchitect.com.au/index.php"

def get_content(link):

    payload = {'action':'show_search_result','action_spam':'dDfgEr','txtSearchType':5,'txtPracName':'','optSstate':3,'optRegions':23,'txtPcode':'','txtShowBuildingType':0,'optBuildingType':1,'optHomeType':1,'optBudget':''}
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36'}
    tree = html.fromstring(requests.post(link, data = payload, headers = headers).text)

    for title in tree.cssselect("div#searchresultaplus"):
        names = title.cssselect("h2")[0].text
        phone = title.cssselect("div p > strong:contains('ph.')")[0].text
        email = title.cssselect("div p > strong:contains('Email')")[0].text
        print(names, phone, email)

get_content(main_url)

值所在的元素:

<div id="searchresultsapluscont">    
        <h2>Hugh Gordon Architect P/L</h2>
            <div id="archdetails">
            <div style="float:left">
                <p>
                    Unit 5/6 Lonsdale Street <br>
                    BRADDON ACT 2612
                </p>
                <p>
                    <strong>Ph.</strong> 02 6253 4448<br>
                     <strong>Email</strong> info@hughgordon.com.au
                </p>
            </div>
            <div style="float:right" class="yogi_v"><div class="img_box">
    <img src="/img/aplusprofile.png" alt="aplus logo">
</div></div>    
            <div class="clearboth">
                        <div><img src="/img/fe_img/resultline.png"></div>
            <p><br>Our company has been designing homes, apartments &amp; townhouses for the past two decades in the A.C.T. and N.S.W. This experience has allowed us to become a leading architecture firm, with great focus on the Multi-Residential sector. Due to our diverse team of designers, town planners, lawyers and Architects we are able to provide sophisticated and complex design solutions for all sectors of the Built Environment. With our head office based in Canberra, A.C.T. we are centrally located and conveniently placed to service both the Sydney, South Coast and Victorian regions.</p></div>

        </div>
        <div style="float:right">
        <a href="javascript:void(0);" onclick="js_show_profile('3796')"><img src="/files/profile_img/3796/4342_4_preview.jpg" alt="Feature Image"></a>
        </div>  
        <div class="clearboth">
            <div style="float:left;"><input type="image" src="/img/fe_img/btn_profileaplus.png" value="View profile" onclick="return js_show_profile('3796')" class="nopad">&nbsp;&nbsp;&nbsp;</div>
            <div style="float:left;"><input type="image" src="/img/fe_img/btn_awardsaplus.png" value="Awards" onclick="return js_show_awards('3796')" class="nopad">&nbsp;&nbsp;&nbsp;</div>
            <div id="idFavBtn_3796" style="padding-top:1px;"><a href="javascript: void(0)" onclick="js_addto_fav('3796','Hugh Gordon Architect P/L','1')"><img src="/img/addtofavaplus.png"></a></div>
        </div>
    </div>

顺便说一句,我不想在这里使用xpath。提前谢谢。你知道吗


Tags: andthedivsrcimgpngstyleemail
1条回答
网友
1楼 · 发布于 2024-04-26 22:18:38

使用tail属性。它包含直接跟在元素后面直到下一个元素的文本。你知道吗

names = title.cssselect("h2")[0].text
phone = title.cssselect("div p > strong:contains('ph.')")[0].tail.strip()
email = title.cssselect("div p > strong:contains('Email')")[0].tail.strip()

相关问题 更多 >