<p>正如<a href="https://stackoverflow.com/users/464744/blender">Blender</a>的评论所回答的,来自<a href="http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters" rel="nofollow noreferrer">wikipedia</a>:</p>
<blockquote>
<p>HTML forbids[8] the use of the characters with Universal Character
Set/Unicode code points</p>
<ul>
<li>0 to 31, except 9, 10, and 13 (C0 control characters)</li>
<li>127 (DEL character)</li>
<li>128 to 159 (x80 – x9F, C1 control characters)</li>
<li>55296 to 57343 (xD800 – xDFFF, the UTF-16 surrogate halves)</li>
</ul>
<p>The Unicode standard also forbids:</p>
<ul>
<li>65534 and 65535 (xFFFE – xFFFF), non-characters, related to xFEFF, the byte order mark.</li>
</ul>
<p>These characters are not even allowed by reference. That is, you
should not even write them as numeric character references. However,
references to characters 128–159 are commonly interpreted by lenient
web browsers as if they were references to the characters assigned to
bytes 128–159 (decimal) in the Windows-1252 character encoding. This
is in violation of HTML and SGML standards, and the characters are
already assigned to higher code points, so HTML document authors
should always use the higher code points. For example, for the
trademark sign (™), use ™, not .</p>
<p>The characters 9 (tab), 10 (linefeed), and 13 (carriage return) are
allowed in HTML documents, but, along with 32 (space) are all
considered "whitespace".[9] The "form feed" control character, which
would be at 12, is not allowed in HTML documents, but is also
mentioned as being one of the "white space" characters – perhaps an
oversight in the specifications. In HTML, most consecutive occurrences
of white space characters, except in a block, are interpreted as
comprising a single "word separator" for rendering purposes. A word
separator is typically rendered a single en-width space in European
languages, but not in all the others.</p>
</blockquote>