删除AppEngine Python Env中的HTML标记（相当于Ruby的Sanitize）

网友

1楼 · 编辑于 2024-04-19 23:46:49

使用lxml：

htmlstring = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'

from lxml.html import fromstring

mySearchTree = fromstring(htmlstring)

for item in mySearchTree.cssselect('a'):
    print item.text

网友

2楼 · 编辑于 2024-04-19 23:46:49

>>> import BeautifulSoup
>>> html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
>>> bs = BeautifulSoup.BeautifulSoup(html)  
>>> bs.findAll(text=True)
[u'foo']

这将为您提供（Unicode）字符串的列表。如果要将其转换为单个字符串，请使用''.join(thatlist)。在

网友

3楼 · 编辑于 2024-04-19 23:46:49

如果不想使用单独的lib，那么可以导入标准的django实用程序。例如：

from django.utils.html import strip_tags
html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg'
stripped = strip_tags(html)
print stripped 
# you got: foo

而且它已经包含在Django模板中，所以您不需要其他任何东西，只需使用filter，如下所示：

^{pr2}$

顺便说一句，这是最快的方法之一。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

删除AppEngine Python Env中的HTML标记（相当于Ruby的Sanitize）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >