在哪里可以找到所有可能的HTML标签列表?

5 投票
3 回答
3560 浏览
提问于 2025-04-15 20:23

在Python中有没有一个标准模块可以列出所有的HTML标签?

举个例子,我想做一些这样的事情:

if is_valid_html_tag('div'):
   print 'div is a valid tag'

if is_not_valid_html_tag('boda'):
   print 'boda is not a valid tag'

为了做到这一点,我需要一个包含所有标签的列表。我在想是否有人已经整理过这些标签,或者它们是否包含在Python库中的xml或其他HTML模块里?

谢谢,Boda Cydo。

3 个回答

0

你可以用wget命令或者直接下载这个json文件:

https://github.com/sindresorhus/html-tags/blob/main/html-tags.json

或者

from urllib.request import urlopen
  
# import json
import json
# store the URL in url as 
# parameter for urlopen
url = "https://raw.githubusercontent.com/sindresorhus/html-tags/main/html-tags.json"
  
# store the response of URL
response = urlopen(url)
  
# storing the JSON response 
# from url in data
data_json = json.loads(response.read())
  
# print the json response
print(data_json)
3

使用这个链接:https://github.com/html5lib/html5lib-python/blob/master/html5lib/sanitizer.py

from html5lib.sanitizer import HTMLSanitizerMixin
print(HTMLSanitizerMixin.acceptable_elements)
8

我不知道有没有现成的模块可以做到这一点。我建议你先找一个标签的列表,然后写一个像这样的函数...

def is_valid_html_tag(tag_name):  
  tags=["a","abbr","acronym","address","area","b","base","bdo","big","blockquote","body","br","button","caption","cite","code","col","colgroup","dd","del","dfn","div","dl","DOCTYPE","dt","em","fieldset","form","h1","h2","h3","h4","h5","h6","head","html","hr","i","img","input","ins","kbd","label","legend","li","link","map","meta","noscript","object","ol","optgroup","option","p","param","pre","q","samp","script","select","small","span","strong","style","sub","sup","table","tbody","td","textarea","tfoot","th","thead","title","tr","tt","ul","var"]
  return tag_name in tags

我觉得有效标签的列表是根据你的文档类型来决定的。这些标签来自于 http://htmldog.com/reference/htmltags/。他们说这个列表是针对严格的XHTML的。

不过,可能还有更好的方法来实现你想做的事情。如果你能提供更多关于你目标的细节,这里友好的人们一定会很乐意帮助你。

撰写回答