如何提高读取和转换二进制文件的速度？

# channel_content is a dictionary, channel_content[channel]['nsamples'] is a string for rec in xrange(number_of_intervals)): for channel in channel_names: channel_content[channel]['recording'].extend( [struct.unpack( "h", f.read(2))[0] for iteration in xrange(int(channel_content[channel]['nsamples']))])

fullsamples = array('h') fullsamples.fromfile(f, os.path.getsize(f.filename)/fullsamples.itemsize - f.tell()) position = 0 for rec in xrange(int(self.header['nrecs'])): for channel in self.channel_labels: samples = int(self.channel_content[channel]['nsamples']) self.channel_content[channel]['recording'].extend( fullsamples[position:position+samples]) position += samples

3条回答

网友

1楼 · 编辑于 2024-06-07 11:32:33

如果文件只有20-30M，为什么不读取整个文件，在对unpack的单个调用中解码nums，然后通过迭代数组在通道之间分发它们：

data = open('data.bin', 'rb').read()
values = struct.unpack('%dh' % len(data)/2, data)
del data
# iterate over channels, and assign from values using indices/slices

一项快速测试显示，这导致在20M文件上的速度比struct.unpack('h', f.read(2))快10倍。

网友

2楼 · 编辑于 2024-06-07 11:32:33

extend（）acepts iterables，也就是说，您可以编写.extend(...)，而不是.extend([...])。它可能会加快程序的速度，因为extend（）将在生成器上处理，而不再在生成的列表上处理

代码中有一个不连贯的地方：首先定义channel_content = {}，然后执行channel_content[channel]['recording'].extend(...)，这需要一个键通道和一个子键“录制”的初始存在，并将一个列表作为一个值，以便能够扩展到某些内容

self.channel_content[channel]['nsamples']的性质是什么，以便它可以提交到int（）函数？

间隔数从何而来？间隔的性质是什么？

在rec in xrange(number_of_intervals)):循环中，我再也看不到rec。因此，在我看来，您重复的循环过程for channel in channel_names:是由间隔数表示的次数的两倍。是否存在要在f中读取的间隔*int（self.channel_content[channel]['nsamples']）*2个值的个数？

我在文件中读到：

class struct.Struct(format)
Return a new Struct object which writes and reads binary data according to the format string format. Creating a Struct object once and calling its methods is more efficient than calling the struct functions with the same format since the format string only needs to be compiled once.

这表达了与samplebias相同的思想。

如果您的目标是创建一个字典，那么还可以使用dict（）和一个生成器作为参数

是的。

编辑

我提议

channel_content = {}
for rec in xrange(number_of_intervals)):
    for channel in channel_names:
        N = int(self.channel_content[channel]['nsamples'])
        upk = str(N)+"h", f.read(2*N)
        channel_content[channel]['recording'].extend(struct.unpack(x) for i,x in enumerate(upk) if not i%2)

我不知道如何考虑J.F.塞巴斯蒂安关于使用数组的建议

网友
3楼 · 编辑于 2024-06-07 11:32:33

您可以使用^{}读取数据：

import array
import os

fn = 'data.bin'
a = array.array('h')
a.fromfile(open(fn, 'rb'), os.path.getsize(fn) // a.itemsize)

它比@samplebias's answer中的struct.unpack快40倍。

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章