BeautifulSoup HTML table解析

from mechanize import Browser from BeautifulSoup import BeautifulSoup mech = Browser() url = "http://www.511virginia.org/RoadConditions.aspx?j=All&r=1" page = mech.open(url) html = page.read() soup = BeautifulSoup(html) table = soup.find("table") rows = table.findAll('tr')[3] cols = rows.findAll('td') roadtype = cols[0].string start = cols.[1].string end = cols[2].string condition = cols[3].string reason = cols[4].string update = cols[5].string entry = (roadtype, start, end, condition, reason, update) print entry

<td headers="road-type" class="ConditionsCellText">Rt. 613N (Giles County)</td> <td headers="start" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Big Stony Ck Rd; Rt. 635E/W (Giles County)</a></td> <td headers="end" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Cabin Ln; Rocky Mount Rd; Rt. 721E/W (Giles County)</a></td> <td headers="condition" class="ConditionsCellText">Moderate</td> <td headers="reason" class="ConditionsCellText">snow or ice</td> <td headers="update" class="ConditionsCellText">01/13/2010 10:50 AM</td>

2条回答

网友

1楼 · 编辑于 2024-04-20 10:02:51

我试图重现您的错误，但源html页面已更改。

关于这个错误，我遇到了一个类似的问题，试图重现的例子是here

更改a Wikipedia Table的建议URL

我把它移到美丽的湖畔

from bs4 import BeautifulSoup

并为.get_text()更改.string

start = cols[1].get_text()

我无法用你的例子进行测试（正如我之前所说，我无法重现错误），但我认为这对人们正在寻找解决这个问题的方法是有用的。

网友

2楼 · 编辑于 2024-04-20 10:02:51

start = cols[1].find('a').string

或者更简单

start = cols[1].a.string

或者更好

start = str(cols[1].find(text=True))

以及

entry = [str(x) for x in cols.findAll(text=True)]

相关问题更多 >

编程相关推荐

热门问题

热门文章