以分层方式将一个大的numpy矩阵分成10个独立的子矩阵

>>> files array(['GAN_0.npy', 'GAN_1.npy', 'GAN_10.npy', ..., 'GAN_822.npy', 'GAN_8220.npy', 'GAN_8221.npy'], dtype='<U13') >>> files.shape (32000,) >>> labels array([1, 1, 1, ..., 1, 1, 1]) >>> np.unique(labels) array([0, 1]) >>> labels.shape (32000,)

3条回答

网友

1楼 · 编辑于 2024-04-23 21:09:18

以下是一个可能的解决方案：

arrays = [np.concatenate((g, r))
          for g, r in zip(np.array_split(files[labels==1], 10), 
                          np.array_split(files[labels==0], 10))]

此解决方案保持“GAN*”和“RAW*”文件的相对顺序。此外，创建数组时，初始位置用“GAN*”文件填充，其余位置用“RAW*”文件填充。如果您对这种排序不满意，您可以在创建每个数组后对它们进行洗牌

网友

2楼 · 编辑于 2024-04-23 21:09:18

下面是一个没有循环的解决方案（由@Crazy Coder在其他答案的评论中建议）：

labels = np.array(labels, dtype=bool)
np.split(np.vstack((files[labels],files[~labels])).T.reshape(-1,1), 10)

网友

3楼 · 编辑于 2024-04-23 21:09:18

我自己的解决方案（基于@CrazyCoder建议）

import numpy as np

#Read file names
file_names=np.genfromtxt('test_files.txt',dtype='str')
   
raw_vector=[]
gan_vector=[]

for i in range(0,file_names.shape[0]):

    image_name=file_names[i]

    #Separate RAW and GAN files
    if("RAW_" in image_name):
        raw_vector.append(image_name)

    if("GAN_" in image_name):
        gan_vector.append(image_name)
    
raw_vector=np.array(raw_vector)
gan_vector=np.array(gan_vector)

#Split into 10 subsets each
raw_vector_divided=np.split(raw_vector,10)
gan_vector_divided=np.split(gan_vector,10)

for j in range(0,10):
    x=raw_vector_divided[j]
    y=gan_vector_divided[j]
    
    x=x.reshape(x.shape[0],1)
    y=y.reshape(y.shape[0],1)
    
    #merge
    experiment_data=np.vstack((x,y))
    
    #Save subset as file
    np.savetxt( 'experiment-' + str(j+1) + '-data.txt',  experiment_data, fmt='%s')

print("finished")

相关问题更多 >

编程相关推荐

热门问题

热门文章