Python构造一个迭代数组的矩阵

from numpy import genfromtxt, linalg, array, append, hstack, vstack #Euclidean distance function def euclidean(v1, v2): dist = linalg.norm(v1 - v2) return dist #get the .csv files and eliminate heading and unused columns from test BMUs = genfromtxt('BMU3.csv', delimiter=',') data = genfromtxt('test.csv', delimiter=',') data = data[1:, :-2] i = 0 for obj in data: D = 0 for BMU in BMUs: Dist = append(euclidean(obj, BMU[: -2]), BMU[-2:]) D = hstack(Dist) Map = vstack(D) #iteration counter i += 1 if not i % 1000: print (i, ' of ', len(data)) print (Map)

2条回答

网友

1楼 · 编辑于 2024-04-25 14:42:54

感谢您的帮助，我成功地实现了伪代码，这里是最终的程序：

import numpy as np


def euclidean(v1, v2):
    dist = np.linalg.norm(v1 - v2)
    return dist


def makeKNN(dataSet, BMUSet, k, fileOut, test=False):
    # take input files
    BMUs = np.genfromtxt(BMUSet, delimiter=',')
    data = np.genfromtxt(dataSet, delimiter=',')

    final = data[1:, :]
    if test == False:
        data = data[1:, :]
    else:
        data = data[1:, :-2]

# Calculate all the distances between data and BMUs than reorder BMU with the distances information

    dist = np.array([[euclidean(d, b[:-2]) for b in BMUs] for d in data])
    BMU_K = np.array([BMUs[np.argsort(d)] for d in dist])

    # median over the closest k BMU
    Z = np.array([[np.sum(b[:k].T[5]) / k] for b in BMU_K])

    # error propagation
    Z_err = np.array([[np.sqrt(np.sum(np.power(b[:k].T[5], 2)))] for b in BMU_K])

    # Adding z estimates and errors to the data
    final = np.concatenate((final, Z, Z_err), axis=1)

    # print output file
    np.savetxt(fileOut, final, delimiter=',')
    print('So long, and thanks for all the fish')

非常感谢，我希望这段代码将来能帮助其他人：）

网友

2楼 · 编辑于 2024-04-25 14:42:54

首先是用法说明：

而不是：

from numpy import genfromtxt, linalg, array, append, hstack, vstack

使用

import numpy as np
....
data = np.genfromtxt(....)
....
     np.hstack...

其次，远离np.append。太容易误用了。使用np.concatenate这样您就可以完全了解它在做什么。你知道吗

列表append更适合增量工作

alist = []
for ....
    alist.append(....)
arr = np.array(alist)

=====================

我猜没有样本数组（或者至少是形状）。但是（n，2）数组听起来是合理的。通过计算每对“点”之间的距离，我可以收集嵌套列表中的值：

In [121]: data = np.arange(6).reshape(3,2)
In [122]: [[euclidean(d,b) for b in data] for d in data]
Out[122]: 
[[0.0, 2.8284271247461903, 5.6568542494923806],
 [2.8284271247461903, 0.0, 2.8284271247461903],
 [5.6568542494923806, 2.8284271247461903, 0.0]]

把它做成一个数组：

In [123]: np.array([[euclidean(d,b) for b in data] for d in data])
Out[123]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

嵌套循环的等效项：

alist = []
for d in data:
    sublist=[]
    for b in data:
        sublist.append(euclidean(d,b))
    alist.append(sublist)
arr = np.array(alist)

有很多方法可以不用循环来实现这一点，但是让我们先确保基本的Python循环方法可以工作。你知道吗

================

如果我想要data中的每个元素（行）和bmu（或此处data）中的每个元素之间的差异（沿最后一个轴），我可以使用数组广播。结果是（3,3,2）数组：

In [130]: data[None,:,:]-data[:,None,:]
Out[130]: 
array([[[ 0,  0],
        [ 2,  2],
        [ 4,  4]],

       [[-2, -2],
        [ 0,  0],
        [ 2,  2]],

       [[-4, -4],
        [-2, -2],
        [ 0,  0]]])

norm可以处理更大维度的数组，并采用axis参数。你知道吗

In [132]: np.linalg.norm(data[None,:,:]-data[:,None,:],axis=-1)
Out[132]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

相关问题更多 >

编程相关推荐

热门问题

热门文章