在哪里可以找到所有可能的HTML标签列表?
在Python中有没有一个标准模块可以列出所有的HTML标签?
举个例子,我想做一些这样的事情:
if is_valid_html_tag('div'):
print 'div is a valid tag'
if is_not_valid_html_tag('boda'):
print 'boda is not a valid tag'
为了做到这一点,我需要一个包含所有标签的列表。我在想是否有人已经整理过这些标签,或者它们是否包含在Python库中的xml或其他HTML模块里?
谢谢,Boda Cydo。
3 个回答
0
你可以用wget命令或者直接下载这个json文件:
https://github.com/sindresorhus/html-tags/blob/main/html-tags.json
或者
from urllib.request import urlopen
# import json
import json
# store the URL in url as
# parameter for urlopen
url = "https://raw.githubusercontent.com/sindresorhus/html-tags/main/html-tags.json"
# store the response of URL
response = urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response.read())
# print the json response
print(data_json)
3
使用这个链接:https://github.com/html5lib/html5lib-python/blob/master/html5lib/sanitizer.py
from html5lib.sanitizer import HTMLSanitizerMixin
print(HTMLSanitizerMixin.acceptable_elements)
8
我不知道有没有现成的模块可以做到这一点。我建议你先找一个标签的列表,然后写一个像这样的函数...
def is_valid_html_tag(tag_name):
tags=["a","abbr","acronym","address","area","b","base","bdo","big","blockquote","body","br","button","caption","cite","code","col","colgroup","dd","del","dfn","div","dl","DOCTYPE","dt","em","fieldset","form","h1","h2","h3","h4","h5","h6","head","html","hr","i","img","input","ins","kbd","label","legend","li","link","map","meta","noscript","object","ol","optgroup","option","p","param","pre","q","samp","script","select","small","span","strong","style","sub","sup","table","tbody","td","textarea","tfoot","th","thead","title","tr","tt","ul","var"]
return tag_name in tags
我觉得有效标签的列表是根据你的文档类型来决定的。这些标签来自于 http://htmldog.com/reference/htmltags/。他们说这个列表是针对严格的XHTML的。
不过,可能还有更好的方法来实现你想做的事情。如果你能提供更多关于你目标的细节,这里友好的人们一定会很乐意帮助你。