如何使用BeautifulSoup获取表中的信息？

import requests from bs4 import BeautifulSoup url = "http://indiawater.gov.in/IMISReports/Reports/WaterQuality/rpt_WQM_LaboratoryInformation_S.aspx?Rep=0&RP=Y" r = requests.get(url) soup = BeautifulSoup(r.content, "html.parser") for tr in soup.find_all('tr', {'class':'oddrowcolor'): print tr

1条回答

网友

1楼 · 发布于 2024-04-16 18:21:43

您可以使用表id来获取表，但是oddrowcolor等。。是动态添加的，因此它不在源中：

import requests
from bs4 import BeautifulSoup
url = "http://indiawater.gov.in/IMISReports/Reports/WaterQuality/rpt_WQM_LaboratoryInformation_S.aspx?Rep=0&RP=Y"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")
table = soup.select_one("#tableReportTable")

for tr in table.find_all("tr"):
    print tr

要提取表数据，可以执行以下操作：

soup = BeautifulSoup(r.content, "html.parser")

# gets the table using the table id
table = soup.select_one("#tableReportTable")
# column names
print(", ".join([th.text.strip() for th in table.select_one("tr").find_all("th")]))

#  tr + tr -> gets all the tr tags after the first 
for tr in table.select("tr + tr"):
    # tr.select("td a") -> get all the anchor tags inside the row tds
    # then get the text from each anchor.
    print(",".join([a.text for a in tr.select("td a")]))

这给了你：

S.No., State, State Labs (without mobile labs), District Labs (without mobile labs), Block Labs/Total Blocks (without mobile labs), SubDivision Labs (without mobile labs), Mobile Labs (State/ District/ Block/ Sub-division Level), Total Labs   (State/ District/ Block/ Sub-division Level)

ANDAMAN and NICOBAR,1,0,NA / 9,0,2,3
ANDHRA PRADESH,1,32,NA / 662,73,0,106
ARUNACHAL PRADESH,1,17,NA / 100,31,0,49
ASSAM,1,29,NA / 242,53,20,103
BIHAR,1,41,NA / 536,0,0,42
CHANDIGARH,0,0,NA / 1,0,0,0
CHATTISGARH,1,27,NA / 146,20,5,53
DADRA & NAGAR HAVELI,0,0,NA / 10,0,0,0
DAMAN & DIU,0,0,NA / 1,0,0,0
DELHI,0,0,NA / 0,0,0,0
GOA,1,0,1 / 11,9,0,11
GUJARAT,1,34,50 / 246,0,6,91
HARYANA,0,21,NA / 126,21,0,42
HIMACHAL PRADESH,1,14,NA / 77,28,0,43
JAMMU AND KASHMIR,0,22,2 / 148,74,0,98
JHARKHAND,1,24,NA / 259,3,5,33
KARNATAKA,1,44,39 / 176,106,46,236
KERALA,1,14,NA / 148,33,0,48
LAKSHADWEEP,0,9,NA / 9,0,0,9
MADHYA PRADESH,1,51,3 / 313,106,0,161
MAHARASHTRA,1,44,2 / 351,139,0,186
MANIPUR,1,9,NA / 38,2,0,12
MEGHALAYA,1,7,NA / 42,22,0,30
MIZORAM,1,8,NA / 26,18,0,27
NAGALAND,0,11,NA / 74,1,2,14
ODISHA,1,32,NA / 314,42,0,75
PUDUCHERRY,0,2,NA / 3,0,0,2
PUNJAB,3,22,8 / 145,0,1,34
RAJASTHAN,1,33,163 / 295,0,0,197
SIKKIM,0,2,NA / 9,0,0,2
TAMIL NADU,1,34,NA / 385,49,0,84
TELANGANA,1,19,NA / 438,56,0,76
TRIPURA,1,8,7 / 58,6,0,22
UTTAR PRADESH,1,76,3 / 820,2,0,82
UTTARAKHAND,0,28,1 / 95,14,0,43
WEST BENGAL,1,18,NA / 341,201,0,220

这似乎符合我在浏览器中看到的，总数等等。。在最后一个tr内的th标记中，因此在循环外添加以下内容：

print(",".join([a.text.strip() for a in tr.select("th")]))

这会给你：

Total,27,732,279,1109,87,2234

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用BeautifulSoup获取表中的信息？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >