我取了一个非常大的Fortran二进制文件，并将其重写为*.csv文件，然后得到一个内存错误，我有什么选择？

recordSize = 24 #24 bytes in a record fileObject = open("filepath", "rb") csvRows = [] while True: fout = fileObject.read(recordSize) if len(fout) != recordSize: break else: csvRows.append([x for x in struct.unpack("ffffff", fout)]) csvFileObject.writerows(csvRows)

2条回答

网友

1楼 · 编辑于 2024-05-19 02:09:36

如果你只是边写边写，而不是建立一个庞大的列表，你应该可以：

import csv, struct

recordSize = 24 #24 bytes in a record

with open("fortran.bin", "rb") as fileObject, open("out.csv", "wb") as fp_out:
    writer = csv.writer(fp_out)
    while True:
        fout = fileObject.read(recordSize)
        if len(fout) != recordSize:
            break
        else:
            writer.writerow(struct.unpack("ffffff", fout))

这给了我

^{pr2}$

尽管我想在标题处写一些名字，但我还是要在标题处写一些名字。在

一些注意事项：（1）[x for x in something_or_other]只是list(something_or_other)，但是这里struct.unpack已经返回了一个tuple，它的效果同样好。（2）在Python中，我们倾向于写record_size，而不是{}。在

如果您想了解更多，请注意Python中一个使某些东西变懒的常见模式是逐个yield元素，类似于：

def read_fortran(filename):
    record_size = 24
    record_format = "f"*6
    with open(filename, "rb") as fp:
        while True:
            row = fp.read(record_size)
            if len(row) < record_size:
                break
            unpacked = struct.unpack(record_format, row)
            yield unpacked

yield类似于return，但它不会结束函数（“生成器”），它会保留状态，直到有东西调用它的next，然后继续。（for循环隐式地执行此操作）这允许您抽象出迭代逻辑。做完这些之后，你可以做一些

>>> read_fortran("fortran.bin")
<generator object read_fortran at 0xb0b02144>
>>> rows = read_fortran("fortran.bin")
>>> for row in rows:
...     print(row)
...     
(0.0, 1.0, 2.0, 3.0, 4.0, 5.0)
(0.0, 2.0, 4.0, 6.0, 8.0, 10.0)
(0.0, 3.0, 6.0, 9.0, 12.0, 15.0)
(0.0, 4.0, 8.0, 12.0, 16.0, 20.0)

其中，可以调用print，而不是print，而不是存储所有行。在

网友

2楼 · 编辑于 2024-05-19 02:09:36

好吧，首先，你不会给我们代码来帮助你。我不是说你写的5000行，而是一个简短简洁的版本，我们可以开始。在

What other options might I have?

如果您已经达到python进程的内存限制，那么很可能是您将整个二进制文件放入内存中。要知道，你并没有填满16GB的内存，而只是你的系统允许你在一个进程中使用的内存。但是你不应该为你的进程增加内存限制，你应该改进你的算法。在

所以，与其读所有的东西，转换那个巨大的文件并把它写下来，为什么不把它变成一个流呢？基本的想法是：

创建CSV写入程序：

import csv
with open('file.csv', 'w'):
    writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

一次一行读取fortran文件（通常行的长度相同，或者在每个语句之间有一个标记）

^{pr2}$

解析并处理这一行数据

            data1, data2, data3, data4 = get_useful_info_from(data)

将其写入CSV：

            writer.write_row([data1, data2, data3, data4])

{cd1>只需要从文件中返回到真正的数据。如果我没有格式，我帮不了你写这个，但是它要么得到给定长度的数据：

def read_that_line(f):
    s = f.read(50)
    while s != "":
        yield s
        s = f.read(50)

或者它获取数据，直到指定的分隔符，然后将open()更改为open('fortran.bin', 'b', newline=0x20) 并使用以下命令迭代该文件：

def read_that_line(f):
    for l in f:
        yield l

或者可能更复杂一些，您必须一次达到一个字节，并在编译完成后返回汇编语句：

def read_that_line(f):
    buf = b""
    while its_not_a_statement_yet(buf):
        buf += f.read(1)
    yield buf

那么内存中只有行的大小和fortran数据每行的临时变量。即使是一台Arduino大小的机器也能处理这个问题！在

以下是代码中的问题：

# you create a list
csvRows = []
while iterate over the file:
    […]
    ### at each iteration over the file, you append 24 bytes in memory
    csvRows.append([ 24 bytes of data ]) 

### until you get the full size of the fortran binary in your memory, which fills your allowed memory space
### before you're even reaching this line!
csvFileObject.writerows(csvRows)

好的，下面是代码的更新，考虑到我的建议：

recordSize = 24 #24 bytes in a record

import csv
### here we open the target csv file that will receive the data
with open('file.csv', 'w') as csvfile:
    ### you may want to configure the csv writer to match your csv file preferences
    ### we create a writer object that will take a list as input, and write it down in the csv file
    writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    ### then we open the fortran binary file
    with open("fortran.bin", "rb") as f:
        ### we initialize the first record in the data variable 
        data = f.read(recordSize)
        ### while we have a full record
        while len(data) != recordSize:
            ### we unpack and write it down to the disk
            writer.write_row([x for x in struct.unpack("ffffff", data)])
            ### and we read the next record, which replace the last one in memory and discarding it
            data = fileObject.read(recordSize)

高温

相关问题更多 >

编程相关推荐

热门问题

热门文章