回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我一遍又一遍地重复下面这个成语。我读了一个大文件(有时,多达120万条记录!)并将输出存储到SQLite数据库中。将东西放入SQLite数据库似乎相当快。你知道吗</p>
<pre><code>def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects):
insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)"
analysisNumber = int(outputPath[-3:])
outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb")
outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject)
numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects)
loop1StartTime = time.time()
for i in range(numberOfRecordsPerObject ):
processedRecords = []
loop2StartTime = time.time()
for j in range(numberOfObjects):
fout = outputFileObject .read(recordSize)
processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))]))
loop2EndTime = time.time()
print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime)
dbInsertStartTime = time.time()
connection.executemany(insertString, processedRecords)
dbInsertEndTime = time.time()
loop1EndTime = time.time()
print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime)
outputFileObject.close()
print "Finished reading output file for analysis {}...".format(analysisNumber)
</code></pre>
<p>当我运行代码时,似乎“循环2”和“插入到数据库”是花费大量执行时间的地方。平均“循环2”时间为<strong>0.003s</strong>,但在某些分析中,它达到了<strong>50000</strong>次。将内容放入数据库所花的时间大致相同:<strong>0.004s</strong>。目前,我每次在loop2完成后都会插入数据库,这样就不必处理内存耗尽的问题。你知道吗</p>
<p>我能做些什么来加速“循环2”?你知道吗</p>