连接普通文本字符串和二进制数据

1 投票

1 回答

3752 浏览

提问于 2025-04-18 17:53

我的目标是手动创建一个HTTP请求（包括头部和主体）。它的格式应该像这样：

Some-Header1: some value1
Some-Header2: some value2
Some-Header3: some value3

-------------MyBoundary
Content-Disposition: form-data; name="file_content_0"; filename="123.pdf"
Content-Length: 93
Content-Type: application/pdf
Content-Transfer-Encoding: binary

  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====

-------------MyBoundary--

我发现这是通过API将文件发送到网络服务的唯一方法，因为我监控了一个用Ruby写的脚本的流量，结果发现它的格式就像我上面展示的那样。

所以像“Some-Header1”这样的头部是普通的文本头部。注意在“==== here is the binary data of 123.pdf ====”之后还有“-------------MyBoundary--”。

但是“==== here is the binary data of 123.pdf ====”是二进制数据。

我的问题是，如何将普通文本数据和二进制数据结合在一起？

附注：我尝试过使用标准库，比如python-requests，但没有成功。目前我不打算再使用它们。我只想知道如何将普通文本和二进制数据结合起来。

更新：

我该如何简单地将二进制数据嵌入到一个字符串中？

import textwrap

body_headers = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.c"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

                    %b ??? -> to indicate that a binary data will be placed here

    -------------MyBoundary--


    """
) % binary_data" #???

更新2：

text1 = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.pdf"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

    replace_me

    -------------MyBoundary--


    """
)

with open("test1.pdf", "rb") as file_hander:
    binary_data = file_hander.read()

print (isinstance(binary_data, str)) # True
print (isinstance("replace_me", str)) # True

print text1.replace("replace_me", binary_data) # --> [Decode error - output not utf-8]

print text1.replace("replace_me", binary_data).encode("utf-8") # exception

错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 195: ordinal not in range(128)

这也让我遇到了一个异常：

print unicode(text1.replace("replace_me", binary_data), "utf-8")
# UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 195: invalid continuation byte

1 个回答

要从文件中加载二进制数据，你可以这样做：

with open(file_name, 'rb') as the_file:
    binary_data = the_file.read()

现在，根据你使用的Python版本，有两种情况：

Python 2 - `unicode` 和 `str`

binary_data会是一个str类型，连接字符串应该没问题，除非你要连接的另一个字符串是unicode类型，这种情况下你可能需要先对它进行编码（在Python 2中，几乎没有网络功能需要unicode）：

normal_str = unicode_str.encode(encoding)

这里的encoding通常是像"utf-8"、"utf-16"或"latin-1"这样的格式，但也可能有其他更特别的编码。

Python 3 - `str` 和 `bytes`

binary_data会是一个bytes对象，你不能直接把它和默认的str连接在一起。如果你发送数据的地方需要bytes，你可以用和Python 2一样的编码方式。如果需要str（不过在网络传输中这种情况不太常见），你必须对给定的编码进行解码（因为这几乎无法猜测，你应该检查一下你的文件使用了什么编码）使用：

normal_str = byte_str.decode(encoding)

再次传入编码作为参数（提示："latin-1"应该没问题，因为它能保留字节，而其他编码，比如"utf-8"，在处理实际的二进制数据时可能会出错（这些数据不是编码过的字符串）[感谢 @SergeBallesta])

为了避免在Python 3中出现这种麻烦，你可以从一开始就把你的头定义为bytes，使用something = b"whatever"而不是something = "whatever"（注意前面加的b），并且将其他输入文件也以二进制方式打开。这样，简单地用+连接字符串就不会有问题。

发送HTTP请求

要将这种原始数据发送到服务器，你有不同的选择：

如果你想要比urllib（或urllib2）和requests提供的更多控制，你可以使用原始套接字进行低级网络操作，发送你想要的任何数据，使用socket（文档中的示例是如何实现这个的好例子）
你可以将数据（包括---(snip)--MyBoundary之间的所有内容）作为请求数据传递给POST请求（如果你的HTTP请求是这样的，问题中没有说明）使用urllib或requests

效率

如果你选择使用原始套接字并发送非常大的文件，你可能想要分块读取文件（使用the_file.read(number_of_bytes)），然后直接写入套接字（使用the_socket.send(read_binary_data)）。[感谢 @Teudimundo]

关于更新

关于更新（这其实应该是个新问题...）：bytes没有格式字符串语法（无论是新的"{}"，还是旧的"%s"）。你需要对bytes对象使用decode将其转换为字符串，然后正确使用格式字符串（或者用encode将字符串转换为bytes，然后使用普通的连接）。另外要注意，textwrap.dedent对bytes是不适用的，因为正则表达式在Python中不适用于bytes。

回答于 2025-04-18 由 Python大师

分享举报

连接普通文本字符串和二进制数据

1 个回答

Python 2 - unicode 和 str

Python 3 - str 和 bytes

发送HTTP请求

效率

关于更新

撰写回答

Python 2 - `unicode` 和 `str`

Python 3 - `str` 和 `bytes`