在Python 3中将utf-16转换为utf-8

1 投票

1 回答

7774 浏览

数据工程师

提问于 2025-04-16 00:34

我正在用Python 3编程，遇到了一个小问题，在网上找不到相关的资料。

根据我的理解，默认的字符串编码是utf-16，但我需要使用utf-8。我找不到可以把默认编码转换成utf-8的命令。非常感谢你的帮助。

utf-8 编码转换字符串编码 utf-16

1 个回答

在Python 3中，有两种重要的数据类型，特别是在处理字符串时。首先是字符串类，它是一个表示Unicode字符的对象。这里要理解的是，这个字符串不是一些字节，而是真正的字符序列。其次是字节类，它只是字节的序列，通常表示以某种编码（比如utf-8或iso-8859-15）存储的字符串。

这对你意味着什么呢？根据我的理解，你想要读取和写入utf-8文件。我们来写一个程序，把所有的'ć'字符替换成'ç'。

def main():
    # Let's first open an output file. See how we give an encoding to let python know, that when we print something to the file, it should be encoded as utf-8
    with open('output_file', 'w', encoding='utf-8') as out_file:
        # read every line. We give open() the encoding so it will return a Unicode string. 
        for line in open('input_file', encoding='utf-8'):
            #Replace the characters we want. When you define a string in python it also is automatically a unicode string. No worries about encoding there. Because we opened the file with the utf-8 encoding, the print statement will encode the whole string to utf-8.
            print(line.replace('ć', 'ç'), out_file)

那么，什么时候应该使用字节呢？其实不常用。我能想到的一个例子是从网络套接字读取数据。如果你有一个字节对象，你可以通过调用bytes.decode('编码')把它转换成Unicode字符串，反之亦然，用str.encode('编码')。不过如前所述，你可能并不需要这样做。

不过，既然这个话题很有趣，这里有一种比较复杂的方法，你可以自己进行编码：

def main():
    # Open the file in binary mode. So we are going to write bytes to it instead of strings
    with open('output_file', 'wb') as out_file:
        # read every line. Again, we open it binary, so we get bytes 
        for line_bytes in open('input_file', 'rb'):
            #Convert the bytes to a string
            line_string = bytes.decode('utf-8')
            #Replace the characters we want. 
            line_string = line_string.replace('ć', 'ç')
            #Make a bytes to print
            out_bytes = line_string.encode('utf-8')
            #Print the bytes
            print(out_bytes, out_file)

关于这个主题（字符串编码）的好资料是 http://www.joelonsoftware.com/articles/Unicode.html。强烈推荐阅读！

来源: http://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

（附注：如你所见，我在这篇文章中没有提到utf-16。我其实不知道Python是否在内部使用这个编码，但这并不重要。当你在处理字符串时，你是在处理字符（代码点），而不是字节。）

回答于 2025-04-16 由 Python大师

分享举报

在Python 3中将utf-16转换为utf-8

1 个回答

撰写回答