我想从这个网站上获得建筑师的信息
https://www.sia.ch/en/membership/member-directory/m/207778/
特别是,我想提取有关姓名、地址、电话号码和电子邮件的信息
这就是我试图做的,但我无法提取这样的信息
我希望有如下输出:
person = ['Pierluigi A Marca', 'Sihlquai 244', '8005 Zürich', '+41 442734340', 'info@bamarch.ch']
import pandas as pd
from urllib import *
from bs4 import BeautifulSoup
from lxml import html
import requests
URL = 'https://www.sia.ch/en/membership/member-directory/m/207778/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='content')
print(results.prettify())
<div class="pagewidth clearfix" id="content">
<div class="textheader">
</div>
<ul class="headlineicon clearfix">
<li class="print">
<a href="javascript:print();">
</a>
</li>
<li class="bookmark">
<a class="addthis_button_favorites" href="javascript:;">
<span>
</span>
</a>
</li>
<li class="share">
<li class="mail_widget">
<a class="addthis_button_email">
<img alt="" src="/fileadmin/templates/img/transp.gif"/>
</a>
</li>
<li class="googleplus">
<a class="addthis_button_google_plusone_share">
<img alt="" src="/fileadmin/templates/img/transp.gif"/>
</a>
</li>
<li class="twitter">
<a class="addthis_button_twitter">
<img alt="" src="/fileadmin/templates/img/transp.gif"/>
</a>
</li>
<li class="facebook">
<a class="addthis_button_facebook">
<img alt="" src="/fileadmin/templates/img/transp.gif"/>
</a>
</li>
<script type="text/javascript">
var addthis_config = { data_track_clickback: false }
</script>
</li>
</ul>
<div class="clearfix spec-height-theme">
<div class="narrowcolumnLeft">
<ul class="clearfix" id="subNavigation">
<li>
<a href="/en/membership/membership/" onfocus="blurLink(this);">
membership
</a>
<span>
</span>
</li>
<li class="active">
<a href="/en/membership/member-directory/" onfocus="blurLink(this);">
member directory
</a>
<span>
</span>
<ul>
<li>
<a href="/en/membership/member-directory/honorary-members/" onfocus="blurLink(this);">
honorary members
</a>
</li>
<li>
<a href="/en/membership/member-directory/individual-members/" onfocus="blurLink(this);">
individual members
</a>
</li>
<li>
<a href="/en/membership/member-directory/corporate-members/" onfocus="blurLink(this);">
corporate members
</a>
</li>
<li>
<a href="/en/membership/member-directory/student-members/" onfocus="blurLink(this);">
student members
</a>
</li>
<li>
<a href="/en/membership/member-directory/partner/" onfocus="blurLink(this);">
partner
</a>
</li>
</ul>
</li>
</ul>
</div>
<div class="widecolumn">
<!--TYPO3SEARCH_begin-->
<div class="csc-default" id="c303">
<div class="tx-updsiafeuseradmin-pi1">
<div class="tx-updsiafeuseradmin-pi1-singleView">
<div class="secr" data-secr="09d93fcfd5cf0f0b68e11bba96f6312c4023c72d">
</div>
<h1 class="mitgliederprofil">
Individual Member
</h1>
<table>
<tr>
<th colspan="2" valign="top">
Address
</th>
</tr>
<tr>
<td colspan="2" valign="top">
<!-- -->
<!--Dipl. Arch. ETH/SIA<br />-->
Mr
<br/>
Pierluigi A Marca
<br/>
Dipl. Arch. ETH/SIA
<br/>
Sihlquai 244
<br/>
8005 Zürich
<br/>
</td>
</tr>
<tr>
<th colspan="2" valign="top">
Contact
</th>
</tr>
<tr>
<td class="col1" valign="top">
Telephone number
<br/>
E-mail
<br/>
</td>
<td valign="top">
<div class="contact-data" data-contact="ggFeglggKF42DCpZz2iOI3EgcsZxN14vIYlhSGFLtORrpHZtgSiJ8tWDNuNxus03JD60nZu+g1FVPIdMiCp/bZMsSL45/+3xu9MMEZLnhH/Y67evbMdMICVsZaULHgIpA+S50ZdTg3glRtCa9CTX/zfXOfgyDaarW44HMYeW6pTMqImejlSubQXjCiPKzS0jgiZHBGspcnBZW/99X0ORYNaEUvOkjJDmozv9yld9A1x4jdyXAqHoDMMx0IICMsJiWcKADTFWKfI0OHHORhv7kvVW3KtbnX5PJjyilH0=">
needs javascript
</div>
</td>
</tr>
<tr>
<th colspan="2" valign="top">
Details
</th>
</tr>
<tr>
<td class="tx-updsiafeuseradmin-pi1-singleView-2cols" valign="top">
Profession
</td>
<td valign="top">
Diploma in Architecture
<br/>
</td>
</tr>
<tr>
<td class="tx-updsiafeuseradmin-pi1-singleView-2cols" valign="top">
Area of activity
<br/>
</td>
<td valign="top">
Architecture
<br/>
</td>
</tr>
<tr>
<td class="tx-updsiafeuseradmin-pi1-singleView-2cols" valign="top">
Professional group
</td>
<td valign="top">
Architecture
</td>
</tr>
<tr>
<td class="tx-updsiafeuseradmin-pi1-singleView-2cols" valign="top">
Section
</td>
<td valign="top">
Zurich
<br/>
</td>
</tr>
<tr>
<td colspan="2" valign="top">
</td>
</tr>
</table>
<!--<div class="tx-updsiafeuseradmin-pi1-singleView-footer lightbox-close-link"><a href="javascript:;">Close</a></div>-->
<div class="tx-updsiafeuseradmin-pi1-singleView-footer" style="display:none;">
<span>
</span>
<a href="javascript:history.back()">
back to results list
</a>
</div>
<script type="text/javascript">
jQuery(document).ready(function() {
if (document.referrer.split( "/" )[2] == "www.sia.ch") {
jQuery(".tx-updsiafeuseradmin-pi1-singleView-footer").show();
}
});
</script>
</div>
</div>
</div>
<!--TYPO3SEARCH_end-->
</div>
</div>
</div>
您必须使用Selenium来允许javascript呈现一些细节。然后你需要做一些操作。Thisget就是你,它包括个人的头衔(
'Mr.'
)输出:
你可以不用硒。我不会提供如何解码的代码(由于法律原因),但这里有一些注意事项说明了如何做到这一点:
//div[@class='contact-data']/@data-contact
aes密钥在这里://div[@class='secr']/@data-secr
每个请求都会生成密钥
祝你好运
相关问题 更多 >
编程相关推荐