在utf8 Python-cod中使用不可识别的mp4标记名

# -*- coding: utf-8 -*- from __future__ import unicode_literals tagname = 'a9777274'.decode('hex') # This value comes from a library as a str, not a unicode if u'\xa9wrt' == tagname: # ??: What test could I run that would get me here without resorting to writing my string in hex? print("You found the tag you're looking for!") else: print("Keep looking!") print(str("This will work: {}").format(tagname)) try: print("This will throw an exception: {}".format(tagname)) # ??: Can I reach this line without resorting to converting my format string to a str? except UnicodeDecodeError: print("Threw exception")

>>> u = u'\xa9wrt' >>> s = u.encode('utf-8') >>> s2 = '\xa9wrt' >>> s3 = 'a9777274'.decode('hex') >>> s2 == s False >>> s2 == s3 True >>> match_tag(s) We have a match! tagname == ©wrt Look! We printed tagname and no exception was raised. >>> match_tag(s2) Traceback (most recent call last): ... UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: invalid start byte

3条回答

网友

1楼 · 编辑于 2024-04-19 02:24:41

\xa9是版权符号。有关详细信息，请参见Unicode标准中的C1 Controls and Latin-1 Supplement。在

也许标签©wrt表示“版权”而不是“作曲家”？在

当您运行'\xa9wrt'.encode('utf-8')时，得到UnicodeDecodeError的原因是encode()期望{}，但是您给了它{}。因此，它首先将其转换为unicode，但假定str编码是'ascii'（或其他默认值）。这就是为什么在编码时会出现解码错误。这个问题应该通过使用unicode:u'\xa9wrt'.encode('utf-8')来解决。在

在Python解释器中，默认情况下，type('')应该返回<type 'str'>。如果在解释器中，首先输入from __future__ import unicode_literals，那么{}应该返回{}。你说，只是天真地输入'\xa9wrt'给了我u'\xa9wrt'，这是不一样的。然而，你的陈述有时是对的，有时是错的。u'\xa9wrt' == '\xa9wrt'的计算结果是True还是{}，这取决于您是否导入了unicode_literals。在

复制、粘贴并将以下内容保存到一个文件中（例如test.py），然后从命令行运行python test.py。在

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

tag1 = u'\xa9wrt'
tag2 = '\xa9wrt'
print("tag1 = u'\\xa9wrt'")
print("tag2 = '\\xa9wrt'")
print("tag1: %s" % tag1)
print("tag2: %s" % tag1)
print("type(tag1): %s" % type(tag1))
print("type(tag2): %s" % type(tag2))
print("tag1 == tag2: %s" % (tag1 == tag2))
try:
    print("str(tag1): %s" % str(tag1))
except UnicodeEncodeError:
    print("str(tag1): raises UnicodeEncodeError")
print("tag1.encode('utf-8'): ".encode('utf-8') + tag1.encode('utf-8'))

将上面的代码复制粘贴到一个文件中，然后在Python2.7中运行，得到了以下输出：

^{pr2}$

编辑：

如果您的代码在内部使用unicode，那么您的生活将更加轻松。这意味着，当您接收到输入时，您将其转换为unicode，或者当您输出时，您将转换为str（如果需要）。因此，当您从某处接收到一个str标记名，请先将其转换为unicode。在

例如，这里是test.py：

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

def match_tag(tagname):
    if isinstance(tagname, str):
        # tagname comes in as str, so let's convert it
        tagname = tagname.decode('utf-8')  # enter the correct encoding here

    # Now that we have a unicode tag, we can deal with it easily:
    if tagname == '\xa9wrt':
        print("We have a match! tagname == %s" % tagname)
        print("Look! We printed tagname and no exception was raised.")

然后，我们运行它：

>>> from test import match_tag
>>> u = u'\xa9wrt'
>>> s = u.encode('utf-8')
>>> type(u)
<type 'unicode'>
>>> type(s)
<type 'str'>
>>> match_tag(u)
We have a match! tagname == ©wrt
Look! We printed tagname and no exception was raised.
>>> match_tag(s)
We have a match! tagname == ©wrt
Look! We printed tagname and no exception was raised.

因此，您需要找出输入字符串使用的编码方式。然后，您将能够将str转换为unicode，您的代码可以更好地流动。在

编辑2:

如果您只是想让s2 = '\xa9wrt'工作，那么您需要首先正确地解码它。s2是一个使用默认编码的str（检查sys.getdefaultencoding()以查看可能是ascii）。但是，\xa9不是ASCII字符，因此Python会自动对其进行转义。这就是s2的问题。将其喂入match_tag()时请尝试此操作：

>>> s2 = '\xa9wrt'
>>> s2_decoded = s2.decode('unicode_escape')
>>> type(s2_decoded)  # This is unicode, just like we want.
<type 'unicode'>
>>> match_tag(s2_decoded)
We have a match! tagname == ©wrt
Look! We printed tagname and no exception was raised.

网友

2楼 · 编辑于 2024-04-19 02:24:41

字符串是用拉丁语1编码的，因此，如果要将其存储在UTF-8文件中或将其与UTF-8字符串进行比较，只需执行以下操作：

>>> '\xa9wrt'.decode('latin-1').encode('utf-8')
'\xc2\xa9wrt'

或者，如果要与Unicode字符串进行比较：

^{pr2}$

网友

3楼 · 编辑于 2024-04-19 02:24:41

我终于找到了一种用unicode_字符在utf-8文件中表示有问题的字符串的方法。我将字符串转换为十六进制，然后返回。具体地说，在控制台中（显然不是unicode_literals模式），我运行

"".join(["{0:x}".format(ord(c)) for c in '\xa9wrt'])

然后我可以用我的源文件创建字符串

^{pr2}$

但这不可能是正确的方法，对吗？首先，如果我的控制台是以完全unicode运行的，我不知道我可以首先输入字符串'\xa9wrt'，让Python告诉我代表字节字符串的十六进制序列。在

相关问题更多 >

编程相关推荐

热门问题

热门文章