为什么靓汤不能正确解析名为“area”的元素？

#!/usr/bin/python3.5 from bs4 import BeautifulSoup xml = """"" <?xml version = '1.0' encoding = 'UTF-8' standalone = 'yes'?> <root> <areax> foo </areax> <area> bar </area> </root> """"" soup = BeautifulSoup (xml, "lxml") print ("\n#### soup ####\n") print (soup) print ("\n#### areax ####\n") areaxs = soup.find_all ("areax") for areax in areaxs: print (areax) print ("\n### area ###\n") areas = soup.find_all ("area") for area in areas: print (area)

#### soup #### <html><body><p>"" <?xml version = '1.0' encoding = 'UTF-8' standalone = 'yes'?> <root> <areax> foo </areax> <area/> bar </root> </p></body></html> #### areax #### <areax> foo </areax> ### area ### <area/>

1条回答

网友

1楼 · 发布于 2024-06-16 11:12:07

文档被解析为HTML，^{}元素是空的HTML元素（不能有任何子元素）。你知道吗

要将其解析为XML，请使用BeautifulSoup(xml, "xml")（docs）：

By default, Beautiful Soup parses documents as HTML. To parse a document as XML, pass in “xml” as the second argument to the BeautifulSoup constructor:
soup = BeautifulSoup(markup, "xml")
You’ll need to have lxml installed.

另一个问题是xml字符串周围有太多引号，因此它实际上是以""开头的（请尝试打印它）。正好三个引号（"""）就足够了。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章