用python中的lxml解析html

import lxml.etree as ET html = """ <p class="footer">[[footer]] - <a href="/rss">feed</a> if you want.</p> """ elem = ET.fromstring(html) infos = elem.xpath('/p') for info in infos: print 1, info.text print 2, ET.tostring(elem) #

1条回答

网友
1楼 · 发布于 2024-04-24 14:43:57

您无法获得确切的字符串，因为lxml将HTML转换为它自己的内部数据结构，然后您需要使用tostring（）方法将其转换回字符串（这意味着属性、嵌套等可能会以稍微不同的顺序/格式出现，并且不会保留空白）。例如：
for info in infos: #Check for some string in the displayed text if "search string" in info.text: print ET.tostring(info)
因为听起来你提到了这可能在页面上的任何地方，所以你可能想把这个check info作为一个函数，并在遍历所有元素时递归调用它。你知道吗
根据您的评论进行编辑：
你可以这样做：
for info in infos: #Check for some string in the displayed text if "search string" in info.text: output_str = info.text for children in info: output_str += ET.tostring(children) print output_str

相关问题更多 >

编程相关推荐

热门问题

热门文章