提取hex文件中的数据

1 投票

2 回答

14516 浏览

提问于 2025-04-18 01:38

我有一个文件，想要在里面查找一个特定的十六进制值（头部），一旦找到这个值，就从这个位置开始读取，直到找到另一个特定的十六进制值（尾部）。

我有一些初始代码：

import binascii

holdhd = ""
holdft = ""
header = "03AABBCC"
footer = "FF00FFAA"

with open ('hexfile', 'rb') as file:    
    bytes = file.read()
    a = binascii.hexlify(bytes)     
    while header in a:      
        holdhd = header     
        print holdhd

这段代码能成功打印出我想找的头部（文件里有多个头部），但是我不太确定接下来该怎么做，如何从这个位置开始读取文件，并打印出所有内容，直到找到尾部。

提前谢谢你！

2 个回答

根据文件的大小，你可能想把所有内容都加载到内存中（把数据当作字节处理），然后用正则表达式提取出头部和尾部之间的部分，比如：

import binascii
import re

header = binascii.unhexlify('000100a0')
footer = binascii.unhexlify('00000000000')

with open('hexfile', 'rb') as fin:
    raw_data = fin.read()

data = re.search('{}(.*?){}'.format(re.escape(header), re.escape(footer)), raw_data).group(1)

回答于 2025-04-18 由 Python大师

分享举报

如果文件小到可以一次性加载到内存里，你就可以把它当作普通的字符串来处理，然后使用 find 方法（可以在这里查看）来查找内容。

现在我们假设一个最糟糕的情况：你不能保证文件的开头就是你要找的头部，而且可能会有多个主体（也就是多个 <header><body><footer> 块）。我创建了一个名为 bindata.txt 的文件，里面有以下内容：

ABCD000100a0AAAAAA000000000000ABCDABCD000100a0BBBBBB000000000000ABCD

好的，这里有两个主体，第一个是 AAAAAA，第二个是 BBBBBB，而且在开头、中间和结尾还有一些杂乱的内容（在第一个头部之前有 ABCD，在第二个头部之前有 ABCDABCD，在第二个尾部之后有 ABCD）

通过使用 str 对象的 find 方法和索引，我得到了以下结果：

header = "000100a0"
footer = "00000000000"

with open('bindata.txt', 'r') as f:
    data = f.read()
    print "Data: %s" % data
    header_index = data.find(header, 0)
    footer_index = data.find(footer, 0)
    if header_index >= 0 and footer_index >= header_index:
        print "Found header at %s and footer at %s" \
              % (header_index, footer_index)
        body = data[header_index + len(header): footer_index]
        while body is not None:
            print "body: %s" % body
            header_index = data.find(header,\
                                     footer_index + len(footer))
            footer_index = data.find(footer,\
                                     footer_index + len(footer) + len(header) )
            if header_index >= 0 and footer_index >= header_index:
                print "Found header at %s and footer at %s" \
                       % (header_index, footer_index)
                body = data[header_index + len(header): footer_index]
            else:
                body = None

输出结果是：

Data: ABCD000100a0AAAAAA000000000000ABCDABCD000100a0BBBBBB000000000000ABCD
Found header at 4 and footer at 18
body: AAAAAA
Found header at 38 and footer at 52
body: BBBBBB

如果你的文件太大，无法全部放在内存中，我认为最好的办法是逐字节读取文件，并创建几个函数来找到头部结束和尾部开始的位置，使用文件的 seek 和 tell 方法。

编辑：

根据提问者的要求，提供一种不需要十六进制编码（使用原始二进制）的方法，并使用 seek 和 tell：

import os
import binascii
import mmap

header = binascii.unhexlify("000100a0")
footer = binascii.unhexlify("0000000000")
sample = binascii.unhexlify("ABCD"
                "000100a0AAAAAA000000000000"
                "ABCDABCD"
                "000100a0BBBBBB000000000000"
                "ABCD")

# Create the sample file:
with open("sample.data", "wb") as f:
    f.write(sample)

# sample done. Now we have a REAL binary data in sample.data

with open('sample.data', 'rb') as f:
    print "Data: %s" % binascii.hexlify(f.read())
    mm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    current_offset = 0
    header_index = mm.find(header, current_offset)
    footer_index = mm.find(footer, current_offset + len(header))
    if header_index >= 0 and footer_index > header_index:
        print "Found header at %s and footer at %s"\
              % (header_index, footer_index)
        mm.seek(header_index + len(header))
        body = mm.read(footer_index - mm.tell())
        while body is not None:
            print "body: %s" % binascii.hexlify(body)
            current_offset = mm.tell()
            header_index = mm.find(header, current_offset + len(footer))
            footer_index = mm.find(footer, current_offset + len(footer) + len(header))
            if header_index >= 0 and footer_index > header_index:
                print "Found header at %s and footer at %s"\
                    % (header_index, footer_index)
                mm.seek(header_index + len(header))
                body = mm.read(footer_index - mm.tell())
            else:
                body = None

这个方法产生了以下输出：

Data: abcd000100a0aaaaaa000000000000abcdabcd000100a0bbbbbb000000000000abcd
Found header at 2 and footer at 9
body: aaaaaa
Found header at 19 and footer at 26
body: bbbbbb

注意，我使用了 Python 的 mmap 模块来帮助在文件中移动。请查看它的文档。此外，这个例子的第一部分包含一些数据，用于创建一个实际的二进制文件 sample.data。执行这一部分：

# Create the sample file:
with open("sample.data", "wb") as f:
    f.write(sample)

会生成以下（非常易读的）文件：

borrajax@borrajax:~/Documents/Tests$ cat ./sample.data 
�������ͫ�������

回答于 2025-04-18 由 Python大师

分享举报

提取hex文件中的数据

2 个回答

撰写回答