如何在Python中读取ascii头后的二进制数据
我有一些图像数据,存储在一个文件里。这个文件的开头有一个ASCII文本的头部,最后以一个空字符结束,后面是二进制数据。这个ASCII头部的长度不一样,我想知道用什么方法打开这个文件,读取头部,找到那个空字符,然后加载二进制数据(用Python)。
谢谢你的帮助,
詹姆斯
3 个回答
1
其他人已经回答了你的问题,但我想补充一点。
在处理二进制数据时,我发现创建一个继承自 file
的子类,增加一些方便的方法来读取和写入打包的二进制数据是很有用的。
虽然对于简单的事情来说,这样做有点过于复杂,但如果你需要解析很多二进制文件格式,这样做可以避免重复劳动,值得花点额外的时间。
即使没有别的,希望这也能作为一个如何使用 struct
的有用示例。顺便提一下,这段代码来自旧的代码,非常符合 Python 2.x 的风格。Python 3.x 在处理这方面(特别是字符串和字节)时有很大的不同。
import struct
import array
class BinaryFile(file):
"""
Automatically packs or unpacks binary data according to a format
when reading or writing.
"""
def __init__(self, *args, **kwargs):
"""
Initialization is the same as a normal file object
%s""" % file.__doc__
super(BinaryFile, self).__init__(self, *args, **kwargs)
def read_binary(self,fmt):
"""
Read and unpack a binary value from the file based
on string fmt (see the struct module for details).
This will strip any trailing null characters if a string format is
specified.
"""
size = struct.calcsize(fmt)
data = self.read(size)
# Reading beyond the end of the file just returns ''
if len(data) != size:
raise EOFError('End of file reached')
data = struct.unpack(fmt, data)
for item in data:
# Strip trailing zeros in strings
if isinstance(item, str):
item = item.strip('\x00')
# Unpack the tuple if it only has one value
if len(data) == 1:
data = data[0]
return data
def write_binary(self, fmt, dat):
"""Pack and write data to the file according to string fmt."""
# Try expanding input arguments (struct.pack won't take a tuple)
try:
dat = struct.pack(fmt, *dat)
except (TypeError, struct.error):
# If it's not a sequence (TypeError), or if it's a
# string (struct.error), don't expand.
dat = struct.pack(fmt, dat)
self.write(dat)
def read_header(self, header):
"""
Reads a defined structure "header" consisting of a sequence of (name,
format) strings from the file. Returns a dict with keys of the given
names and values unpaced according to the given format for each item in
"header".
"""
header_values = {}
for key, format in header:
header_values[key] = self.read_binary(format)
return header_values
def read_nullstring(self):
"""
Reads a null-terminated string from the file. This is not implemented
in an efficient manner for long strings!
"""
output_string = ''
char = self.read(1)
while char != '\x00':
output_string += char
char = self.read(1)
if len(char) == 0:
break
return output_string
def read_array(self, type, number):
"""
Read data from the file and return an array.array of the given
"type" with "number" elements
"""
size = struct.calcsize(type)
data = self.read(size * number)
return array.array(type, data)
1
像这样做有效吗:
with open('some_file','rb') as f:
binary_data = f.read().split('\0',1)[1]
2
可能应该先从这样的内容开始。
with open('some file','rb') as input:
aByte= input.read(1)
while aByte and ord(aByte) != 0: aByte= input.read(1)
# At this point, what's left is the binary data.
在这种情况下,Python的版本号非常重要。问题出在read
这个函数上。有些版本会返回字节(也就是数字),而其他版本则会返回字符串(这需要用到ord(aByte)
这个函数)。