我有如下html数据:
<!DOCTYPE html>
<html>
<head>
<script type="text/blzscript">
</script>
<title></title>
</head>
<body>
<p class="status-box">In some countries, this medicine may only be approved for veterinary use.</p>
<h3>Scheme</h3>
<p>Rec.INN</p>
<h3>CAS registry number (Chemical Abstracts Service)</h3>
<p>0000850-52-2</p>
<h3>Chemical Formula</h3>
<p>C21-H26-O2</p>
<h3>Molecular Weight</h3>
<p>310</p>
<h3>Therapeutic Category</h3>
<p>Progestin</p>
<h3>Chemical Names</h3>
<p>17α-Allyl-17-hydroxyesta-4,9,11-trien-3-one (WHO)</p>
<p>Estra-4,9,11-trien-3-one, 17β-hydroxy-17-(2-propenyl)- (USAN)</p>
<h3>Foreign Names</h3>
<ul>
<li>Altrenogestum (Latin)</li>
<li>Altrenogest (German)</li>
<li>
<a href="altr%C3%A9nogest.html">Altrénogest</a> (French)
</li>
<li>Altrenogest (Spanish)</li>
</ul>
<h3>Generic Names</h3>
<ul>
<li>Altrenogest (OS: BAN, USAN)</li>
<li>
<a href="altr%C3%A9nogest.html">Altrénogest</a> (OS: DCF)
</li>
<li>A 35957 (IS)</li>
<li>A 41300 (IS)</li>
<li>RH 2267 (IS)</li>
<li>RU 2267 (IS: RousselUclaf)</li>
</ul>
<h3>Brand Names</h3>
<div class='contentAdRight' id='third_ad_unit'>
<div class='adsense-ad adsense-ad-text-image-flash-html adsense-ad-300 adsense-ad-300x600 adsense-ad-international'>
<script type="text/blzscript">
google_ad_client="pub-3964816748264478";google_ad_channel="";google_ad_format="300x600_pas_abgc";google_ad_width="300";google_ad_height="600";google_ad_type="text,image,flash,html";google_color_border="FFFFFF";google_color_bg="FFFFFF";google_color_link="0000FF";google_color_text="000000";google_color_url="008000";google_analytics_domain_name="drugs.com";
</script>
<h1></h1>
</div>
</div>
</body>
</html>
我想提取:
外国名称、通用名称和品牌名称: 我试过了
test = soup.select('h1')[0].text.strip()
print(test)
但这不是给我想要的,我也试图提取脚本,但他们都没有给我的要求结果
目前没有回答
相关问题 更多 >
编程相关推荐