如何使用Beautiful Soup 解析嵌套表格?
我试过这个方法:s = soup.findAll("table", {"class": "view"})
但是它给我的是外面的表格。我需要的是里面的表格。
<table class="view" >
<tr>
<td width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
<b>My Face</b><br />
Hello This is me,
</div>
<div class="abc"">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</tr>
</table>
The things I want to scrap is:
Hello This is me,
My Face
Dec 6, 2010 by Alis
2 个回答
2
s = soup.findAll("table", {"class": "view"})[0].find("table")
如果只有一个表格的话,你也可以用 .find
来找到第一个表格,这样就不需要加 [0]
了。
2
这里有一些格式更好的HTML代码:
<table class="view" >
<tr>
<td width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span>
<span class="aaaaaaa " title="aaaaaaaaaaa">
<span>aaaaaaaaaaaaa</span>
</span>
</span>
<b>My Face</b>
<br />
Hello This is me,
</div>
<div class="abc">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
注意:我实际上添加了一个标签,因为之前缺少了一个。
innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row
innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides
这样你就能找到那个包含你所有内容的标签。从那里开始,只需要稍微处理一下,就能提取出你真正需要的内容。