用Python从大型文本文件到稀疏矩阵

i=1 cpt=0 skip=0 finnum=0 indice=1 vec=[] mat=[] for line in fileinput.input("MY_TEXT_FILE.TXT"): if i==1: # skipping the first line skip = 1 if (finnum == 0)and(skip==0): # special reading operation for the first 10% (approximately) tline=shlex.split(line) ind_loc=0 while ind_loc<len(tline): if (int(tline[ind_loc])!=0): vec.append(int(tline[ind_loc])) ind_loc=ind_loc+1 if (finnum == 1)and(skip==0): print('finnum = 1') h=input() break if (' 0' in line): finnum = 1 if skip == 0: i=i+1 else: skip=0 i=i+1 cpt=cpt+1

2条回答

网友

1楼 · 编辑于 2024-06-16 09:28:43

当您处理大量的数字数据时，您应该真正使用Numpy，而不是纯python。这通常要快10倍以上，并且可以让您访问Matlab风格的复杂计算。我现在没有时间转换代码（有一个示例文件是最容易的），但是可以肯定的是，使用numpy.loadtxt可以快速有效地读取文件的第二部分。跳过第一部分并转换为float的代码的第二部分可能可以这样完成：

A, B, C = np.loadtxt('MY_TEXT_FILE.TXT', skiprows = cpt, unpack = True)

您可能想使用数据格式（通过添加dtype = (int, int, float)左右，不知道如何做到这一点），因为我猜前两列是整数。在

还要注意，numpy有一个sparse matrix数据类型可用。在

网友

2楼 · 编辑于 2024-06-16 09:28:43

听着，我想出了一个混合的解决方案，它似乎工作得更快。我创建了100万个样本随机数据，就像你上面提到的那样，并为你的代码计时。在我的Mac电脑里花了77秒，顺便说一句，这是一台超高速的电脑。使用numpy而不是shlex拆分字符串只需5秒的处理过程。在

A=[0]*len(matrix)
B=[0]*len(matrix)
C=[0]*len(matrix)
for i in range(len(matrix)):
    full_array = np.fromstring(matrix[i], dtype=float, sep=" ")
    A[i]=full_array[0]
    B[i]=full_array[1]
    C[i]=full_array[2]

我做了几次测试，看起来效果不错，速度快了14倍。我希望有帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章