代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<div><p>p_string</p><div>div_string</div></div>')
for m in soup.div:
print "extract(first loop): ", m.extract()
print "current soup.div(frist loop): ", soup.div #it contains another div block
print '___________________________________________________________'
#I have to do another for loop to purge the remaining div block, why?
for m in soup.div:
print "extract(second loop): ", m.extract()
print "current soup.div(second loop): ", soup.div #removed
结果:
^{pr2}$为什么它没有在第一个for
循环中提取所有元素(p
和div
)?在
这是因为您在循环中调用
extract()
,该循环从树中删除一个标记—在遍历标记的子对象时删除它们。它与iterating over the list and remove items from it in the loop基本相同。相反,请使用^{} :
相关问题 更多 >
编程相关推荐