在Python 3中测试UnicodeDecodeError

4 投票

2 回答

3593 浏览

提问于 2025-04-16 22:23

我有一个测试，用来检查一个只能接受unicode文本的函数，这个是在Python 2.x中写的。

def testNonUnicodeInput(self):
        """ Test falure on non-unicode input. """
        input = "foo".encode('utf-16')
        self.assertRaises(UnicodeDecodeError, myfunction, input)

但是，当我在Python 3.x中运行这个测试时，它失败了。我得到了：

AssertionError: UnicodeDecodeError not raised by myfunction

我正在想办法设置一个测试，这个测试在Python 2.x中可以继续工作，同时在经过2to3转换后也能在Python 3.x中正常运行。

我应该提到的是，在我的函数中，我做了以下操作来强制使用unicode：

def myfunction(input):
    """ myfunction only accepts unicode input. """
    ...
    try:
        source = unicode(source)
    except UnicodeDecodeError, e:
        # Customise error message while maintaining original trackback
        e.reason += '. -- Note: Myfunction only accepts unicode input!'
        raise
    ...

当然，这个（连同测试）在运行Python 3.x之前，都是通过2to3处理过的。我想我在Python 3中真正想要的是不接受字节字符串，而我以为通过先编码字符串就能做到这一点。我没有使用'utf-8'作为编码，因为我知道那是默认的。

有没有人对保持一致性有什么想法？

unicode python 3 text processing string encoding python 2 2to3 decode error compatibility testing

2 个回答

好吧，我决定暂时不在Python 3下进行测试了。

if sys.version_info < (3, 0):
    input = "foo".encode('utf-16')
    self.assertRaises(UnicodeDecodeError, myfunction, input

不过，如果有人能推荐一个在Python 2和3下都能通过的测试，我很乐意听听建议。

回答于 2025-04-16 由 Python大师

分享举报

在Python 3中，你不需要对字符串做什么，因为它们都是Unicode格式的。只要测试一下是不是字符串类型就可以了，使用isinstance(s, str)就行了。如果问题正好相反，那你需要用bytes.decode()来处理。

好吧，这里有一种方法可以在Python 3和Python 2中都引发UnicodeDecodeError错误：

Python 3:

>>> "foo".encode('utf-16').decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
"foo".encode('utf-16').decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

Python 2:

>>> "foo".encode('utf-16').decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

我不确定2to3工具是否会自动把字符串字面量转换成b"foo"这种写法。如果会的话，你只需要手动去掉前面的b，或者想办法让它忽略这个。

回答于 2025-04-16 由 Python大师

分享举报

在Python 3中测试UnicodeDecodeError

2 个回答

撰写回答