我想从这个网站检索所有的行数据 https://www.dibbs.bsm.dla.mil/Awards/AwdRecs.aspx?Category=awddt&TypeSrch=cq&Value=02-06-2018 这是行的示例html
<tr class="BgWhite" style="border-color:Gray;border-width:1px;border-style:Solid;">
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl43_lblAwardBasicNumber" style="display:inline-block;width:150px;"><a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/03JAN17/SP450017D0005.PDF" title="Link To Award/Basic Document" target="DIBBSDocuments"><img src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" alt="PDF Document" width="16" height="16" hspace="2" border="0"></a><a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/03JAN17/SP450017D0005.PDF" title="Link To Award/Basic Document" target="DIBBSDocuments">SP450017D0005</a></span>
</td>
<td align="center" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl43_lblCage"><a href="javascript:void(0);" onclick="return openNewWindow("https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0ZE15", "CAGE", 475, 300)" title="Click to perform a CAGE Search">0ZE15</a></span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl43_lblTotalContactPrice"> $2,341.94</span>
</td>
</tr>
<tr class="BgSilver" style="border-color:Gray;border-width:1px;border-style:Solid;">
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl44_lblDeliveryOrder" style="display:inline-block;width:175px;"><a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/06FEB18/SP450017D0005SP450018F2293.PDF" title="Link To Delivery Order Document" target="DIBBSDocuments"><img src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" alt="PDF Document" width="16" height="16" hspace="2" border="0"></a><a href="https://dibbs2.bsm.dla.mil/Downloads/Awards/06FEB18/SP450017D0005SP450018F2293.PDF" title="Link To Delivery Order Document" target="DIBBSDocuments">SP450018F2293</a> <br><img src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16" height="16" hspace="1" border="0" alt="-spacer-"><span style="font-size: 9px;">» <a href="https://www.dibbs.bsm.dla.mil/Awards/AwdRec.aspx?contract=SP450017D0005&dlv=SP450018F2293&cnt=108" title="Delivery Order Package View" target="DIBBS">Delivery Order Package View</a></span></span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl44_lblDeliveryOrderCounter" style="display:inline-block;width:50px;">108</span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl44_lblTotalContactPrice"> $2,341.94</span>
</td>
我想从html中提取awardids SP450017D0005和SP450018F2293。所以我试了这个 dibbssoup=BeautifulSoup(main\u page.content,'html5lib')
containers1 = dibbssoup.find_all("tr", {"class": "BgWhite"})
containers2 = dibbssoup.find_all("tr", {"class": "BgSilver"})
containers = containers1 + containers2
for container1 in containers:
for page in range(row)[3:]:
containerid = "ctl00_cph1_grdAwardSearch_ctl"+str(page)+"_lblAwardBasicNumber"
awardid = container1.find("td", {"align": "left"}).find("span", {"id":containerid})
print(page)
print(containerid)
print(awardid)
print(" ")
页面增量工作,containerid正确,但awardid的输出为“none”。我做错了什么?我怎样才能改正
我目前没有看到你的代码有什么大的缺陷。使用这种嵌套的html标记时,将
find
语句拆分并打印每个语句的结果通常很有用。调试时,现在可以清楚地看到哪些find
调用失败。在解决了问题之后,您仍然可以重新组合它们并清理代码要摆脱
page
和containerid
变量,可以使用函数作为find
的参数,如下所示:你可以在这里找到更多信息:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-function
使用您提供的示例html运行此代码时,我得到:
第二个
awardid
是None
,因为不包含
span
与id
类似的ctl00_cph1_grdAwardSearch_ctl43_lblAwardBasicNumber
相关问题 更多 >
编程相关推荐