Python文件读取并进行字节序转换

Question

最近有人在StackOverflow上问如何在Python中读取文件，而被接受的回答建议了类似的方法：

with open('x.txt') as x: f = x.read()

我该如何读取这个文件并转换数据的字节序呢？

举个例子，我有一个1GB的二进制文件，里面是一堆以大端格式存储的单精度浮点数，我想把它转换成小端格式，然后放到一个numpy数组里。下面是我写的一个函数来实现这个功能，还有一些实际调用它的代码。我使用了struct.unpack来进行字节序转换，并尝试通过使用mmap来加快速度。

我的问题是，我用mmap和struct.unpack的方式正确吗？有没有更简单、更快的方法来做到这一点？现在我用的方法是有效的，但我真的想学会更好的做法。

提前谢谢大家！

#!/usr/bin/python
from struct import unpack
import mmap
import numpy as np

def mmapChannel(arrayName,  fileName,  channelNo,  line_count,  sample_count):
    """
    We need to read in the asf internal file and convert it into a numpy array.
    It is stored as a single row, and is binary. Thenumber of lines (rows), samples (columns),
    and channels all come from the .meta text file
    Also, internal format files are packed big endian, but most systems use little endian, so we need
    to make that conversion as well.
    Memory mapping seemed to improve the ingestion speed a bit
    """
    # memory-map the file, size 0 means whole file
    # length = line_count * sample_count * arrayName.itemsize
    print "\tMemory Mapping..."
    with open(fileName, "rb") as f:
        map = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        map.seek(channelNo*line_count*sample_count*arrayName.itemsize)

        for i in xrange(line_count*sample_count):
            arrayName[0, i] = unpack('>f', map.read(arrayName.itemsize) )[0]

        # Same method as above, just more verbose for the maintenance programmer.
        #        for i in xrange(line_count*sample_count): #row
        #            be_float = map.read(arrayName.itemsize) # arrayName.itemsize should be 4 for float32
        #            le_float = unpack('>f', be_float)[0] # > for big endian, < for little endian
        #            arrayName[0, i]= le_float

        map.close()
    return arrayName

print "Initializing the Amp HH HV, and Phase HH HV arrays..."
HHamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HHphase = np.ones((1,  line_count*sample_count),  dtype='float32')
HVamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HVphase = np.ones((1,  line_count*sample_count),  dtype='float32')



print "Ingesting HH_Amp..."
HHamp = mmapChannel(HHamp, 'ALPSRP042301700-P1.1__A.img',  0,  line_count,  sample_count)
print "Ingesting HH_phase..."
HHphase = mmapChannel(HHphase, 'ALPSRP042301700-P1.1__A.img',  1,  line_count,  sample_count)
print "Ingesting HV_AMP..."
HVamp = mmapChannel(HVamp, 'ALPSRP042301700-P1.1__A.img',  2,  line_count,  sample_count)
print "Ingesting HV_phase..."
HVphase = mmapChannel(HVphase, 'ALPSRP042301700-P1.1__A.img',  3,  line_count,  sample_count)

print "Reshaping...."
HHamp_orig = HHamp.reshape(line_count, -1)
HHphase_orig = HHphase.reshape(line_count, -1)
HVamp_orig = HVamp.reshape(line_count, -1)
HVphase_orig = HVphase.reshape(line_count, -1)

性能优化文件读取二进制文件 numpy数组小端格式大端格式字节序转换单精度浮点数

Python文件读取并进行字节序转换

4 个回答

撰写回答