有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java TransformerFactory破坏<html>标记中的<input>和<br>标记

通过简单的代码解析和重写简单的xml,会发生一些奇怪的事情

输入:

<html>
<input></input>
</html>

给出输出(格式不正确):

<html>
<input>
</html>

<;输入/>;,或<;br/>

它不会发生在内部<;html2>;,与其他标签

代码是经典的:

// READ XML
DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();

// PARSE
Document document = builder.parse(new InputSource(new StringReader(_xml_source)));

// WRITE XML

TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(document), new StreamResult(buffer));
String output = buffer.toString();

这是已知的bug吗


共 (1) 个答案

  1. # 1 楼答案

    XSLT定义了一个output method,它可以是xmlhtmltext

    规范指出,如果根节点是<html>,则默认输出方法应该是html,否则应该是xml

    使用xml方法,您将得到<input/>

    使用html方法,您将得到<input>,因为HTML specification这样说

    如果需要,可以显式提供输出方法:

    transformer.setOutputProperty(OutputKeys.METHOD, "xml");
    

    因此,具有<html>根节点的文档将输出XML,即<input/>

    引号

    XSLT output method

    The default for the method attribute is chosen as follows. If

    • the root node of the result tree has an element child,
    • the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and
    • any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,

    then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute.

    HTML empty tags

    Some HTML element types have no content. For example, the line break element BR has no content; its only role is to terminate a line of text. Such empty elements never have end tags. The document type definition and the text of the specification indicate whether an element type is empty (has no content) or, if it can have content, what is considered legal content.