Python方法将网络字节序的字节列表存储为文件的字节序(小端)

4 投票

3 回答

5741 浏览

数据工程师

提问于 2025-04-16 02:18

我现在的任务是分析包含P2P消息的tcpdump数据，但在处理我获取的数据并写入文件时遇到了麻烦。我怀疑可能是我写入文件的字节顺序出了问题。

我有一组字节，里面存着一些P2P视频的数据，这些数据是通过python-pcapy这个包读取和处理的。

bytes = [14, 254, 23, 35, 34, 67, 等等...]

我想找到一种方法，把这些字节（目前在我的Python应用中以列表的形式存储）写入文件。

现在我写入数据的方式是这样的：

def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts): 
    file = open(filename,"ab")
    # Iterate through bytes writing them to a file if don't have piece already 
    if not self.piecemap[ipdst].has_key(pieceindex):
        for byte in bytes: 
            file.write('%c' % byte)
        file.flush()
        self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))

    # Remember we have this piece now in case duplicates arrive 
    self.piecemap[ipdst][pieceindex] = True

    # TODO: Collect stats 
    file.close()

从这个for循环可以看出，我是按照从网络获取的顺序（也就是大端序）将字节写入文件的。

可以说，这些视频数据在VLC播放器中播放效果不好 :-D

我觉得我需要把它们转换成小端序，但不太确定在Python中该怎么做。

更新

对我有效的解决方案（处理P2P数据时正确处理字节顺序）是：

def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts): 
    file = open(filename,"r+b")
    if not self.piecemap[ipdst].has_key(pieceindex):
        little = struct.pack('<'+'B'*len(bytes), *bytes) 
        # Seek to offset based on piece index 
        file.seek(pieceindex * self.piecesize)
        file.write(little)
        file.flush()
        self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))

    # Remember we have this piece now in case duplicates arrive 
    self.piecemap[ipdst][pieceindex] = True

    file.close()

这个解决方案的关键是使用Python的struct模块，正如我所怀疑的，特别是：

    little = struct.pack('<'+'B'*len(bytes), *bytes)

感谢那些提供有用建议的人。

数据处理 tcpdump struct模块字节序 VLC播放器小端序大端序 P2P协议

3 个回答

这个问题可能之前在 Python 文件读取和字节序转换这个链接中已经回答过了。

回答于 2025-04-16 由 Python大师

分享举报

为了省点功夫，你可以使用一个叫做 bytearray 的东西（适用于Python 2.6及以后的版本）：

b = [14, 254, 23, 35]
f = open("file", 'ab')
f.write(bytearray(b))

这个方法可以把你0到255之间的值直接转换成字节，不需要你自己一个个循环去处理。

如果没有更多的信息，我看不出你具体遇到了什么问题。如果数据确实是按字节来处理的，那么字节序就不是问题，正如其他人所说的那样。

顺便说一下，使用 bytes 和 file 作为变量名并不好，因为这样会遮盖掉Python中本身就有的同名内置功能。

回答于 2025-04-16 由 Python大师

分享举报

你还可以使用一个 array.array：

from array import array
f.write(array('B', bytes))

而不是

f.write(struct.pack('<'+'B'*len(bytes), *bytes))

稍微整理一下就是

f.write(struct.pack('B' * len(bytes), *bytes))
# the < is redundant; there is NO ENDIANNESS ISSUE

如果字节的长度是“很大”的话，可能更好用

f.write(struct.pack('%dB' % len(bytes), *bytes))

回答于 2025-04-16 由 Python大师

分享举报

Python方法将网络字节序的字节列表存储为文件的字节序(小端)

3 个回答

撰写回答