擅长:python、mysql、java
<p>我想我应该重写我的答案。在</p>
<p>内置的按摩器对轻微损伤很好(额外的空白,没有闭合的斜杠等等)。我当然会在卷入此事之前设法逃脱惩罚。在</p>
<p>您可以<a href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Sanitizing%20Bad%20Data%20with%20Regexps" rel="nofollow noreferrer">pass in your own massages</a>,我建议您扩展默认设置:</p>
<pre><code>import copy, re
myMassage = [(re.compile('<!-([^-])'), lambda match: '<! ' + match.group(1))]
myNewMassage = copy.copy(BeautifulSoup.MARKUP_MASSAGE)
myNewMassage.extend(myMassage)
BeautifulSoup(badString, markupMassage=myNewMassage)
# Foo<! This comment is malformed. >Bar<br />Baz
</code></pre>
<p>你可能最好这样做,因为它都进入一个解析池,获得美化组优化。。。虽然运行时性能可能非常相似。在</p>