如何在Python中读取ascii头后的二进制数据

2 投票

3 回答

6623 浏览

提问于 2025-04-16 11:16

我有一些图像数据，存储在一个文件里。这个文件的开头有一个ASCII文本的头部，最后以一个空字符结束，后面是二进制数据。这个ASCII头部的长度不一样，我想知道用什么方法打开这个文件，读取头部，找到那个空字符，然后加载二进制数据（用Python）。

谢谢你的帮助，
詹姆斯

文件格式文件处理数据解析 ascii编码二进制数据

3 个回答

其他人已经回答了你的问题，但我想补充一点。

在处理二进制数据时，我发现创建一个继承自 file 的子类，增加一些方便的方法来读取和写入打包的二进制数据是很有用的。

虽然对于简单的事情来说，这样做有点过于复杂，但如果你需要解析很多二进制文件格式，这样做可以避免重复劳动，值得花点额外的时间。

即使没有别的，希望这也能作为一个如何使用 struct 的有用示例。顺便提一下，这段代码来自旧的代码，非常符合 Python 2.x 的风格。Python 3.x 在处理这方面（特别是字符串和字节）时有很大的不同。

import struct
import array

class BinaryFile(file):
    """
    Automatically packs or unpacks binary data according to a format
    when reading or writing.
    """
    def __init__(self, *args, **kwargs):
        """
        Initialization is the same as a normal file object
        %s""" % file.__doc__
        super(BinaryFile, self).__init__(self, *args, **kwargs)

    def read_binary(self,fmt):
        """
        Read and unpack a binary value from the file based
        on string fmt (see the struct module for details).
        This will strip any trailing null characters if a string format is
        specified. 
        """
        size = struct.calcsize(fmt)
        data = self.read(size)
        # Reading beyond the end of the file just returns ''
        if len(data) != size:
            raise EOFError('End of file reached')
        data = struct.unpack(fmt, data)

        for item in data:
            # Strip trailing zeros in strings 
            if isinstance(item, str):
                item = item.strip('\x00')

        # Unpack the tuple if it only has one value
        if len(data) == 1: 
            data = data[0]

        return data

    def write_binary(self, fmt, dat):
        """Pack and write data to the file according to string fmt."""
        # Try expanding input arguments (struct.pack won't take a tuple)
        try: 
            dat = struct.pack(fmt, *dat) 
        except (TypeError, struct.error): 
            # If it's not a sequence (TypeError), or if it's a 
            # string (struct.error), don't expand.
            dat = struct.pack(fmt, dat) 
        self.write(dat)

    def read_header(self, header):
        """
        Reads a defined structure "header" consisting of a sequence of (name,
        format) strings from the file. Returns a dict with keys of the given
        names and values unpaced according to the given format for each item in
        "header".
        """
        header_values = {}
        for key, format in header:
            header_values[key] = self.read_binary(format)
        return header_values

    def read_nullstring(self):
        """
        Reads a null-terminated string from the file. This is not implemented
        in an efficient manner for long strings!
        """
        output_string = ''
        char = self.read(1)
        while char != '\x00':
            output_string += char
            char = self.read(1)
            if len(char) == 0:
                break
        return output_string

    def read_array(self, type, number):
        """
        Read data from the file and return an array.array of the given
        "type" with "number" elements
        """
        size = struct.calcsize(type)
        data = self.read(size * number)
        return array.array(type, data)

回答于 2025-04-16 由 Python大师

分享举报

像这样做有效吗：

with open('some_file','rb') as f:
  binary_data = f.read().split('\0',1)[1]

回答于 2025-04-16 由 Python大师

分享举报

可能应该先从这样的内容开始。

with open('some file','rb') as input:
    aByte= input.read(1)
    while aByte and ord(aByte) != 0: aByte= input.read(1)
    # At this point, what's left is the binary data.

在这种情况下，Python的版本号非常重要。问题出在read这个函数上。有些版本会返回字节（也就是数字），而其他版本则会返回字符串（这需要用到ord(aByte)这个函数）。

回答于 2025-04-16 由 Python大师

分享举报

如何在Python中读取ascii头后的二进制数据

3 个回答

撰写回答