numpy 将向量转换为二进制矩阵

5 投票

1 回答

4340 浏览

提问于 2025-04-18 04:16

我想找到一种简单的方法，把一个整数向量转换成一个二维的二进制数组，其中1出现在对应于向量值的列中，0则在其他地方。

也就是说：

v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))

我想把C转换成这个样子：

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.]])

我想出来了这个方法：

C[np.arange(v.shape[0]), v.T-1] = 1

但是我在想有没有更简洁、更优雅的方法呢？

谢谢！

更新

感谢大家的评论！我发现我的代码有个错误：如果在v中有0，它会把1放到错误的位置（最后一列）。所以，我需要扩展分类数据来包含0。

jrennie的答案对于处理稀疏矩阵的大向量来说是个很大的胜利。不过在我的情况下，我需要返回一个数组以保持兼容性，而这个转换完全消除了优势——看看这两种解决方案：

    def permute_array(vector):
        permut = np.zeros((vector.shape[0], vector.max()+1))
        permut[np.arange(vector.shape[0]), vector] = 1
        return permut

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut

    In [193]: vec = np.random.randint(1000, size=1000)
    In [194]: np.all(permute_matrix(vec) == permute_array(vec))
    Out[194]: True

    In [195]: %timeit permute_array(vec)
    100 loops, best of 3: 3.49 ms per loop

    In [196]: %timeit permute_matrix(vec)
    1000 loops, best of 3: 422 µs per loop

现在，添加转换：

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut.toarray()

    In [198]: %timeit permute_matrix(vec)
    100 loops, best of 3: 4.1 ms per loop

数据处理 numpy 稀疏矩阵分类数据数组操作维度扩展二进制矩阵向量转换

1 个回答

你这个解决方案有个缺点，就是在处理大数据时效率不高。如果你想要更高效的表示方式，可以使用 scipy 的稀疏矩阵，比如：

import scipy.sparse
import numpy

indices = [1, 5, 3]
indptr = range(len(indices)+1)
data = numpy.ones(len(indices))
matrix = scipy.sparse.csr_matrix((data, indices, indptr))

可以看看耶鲁格式和 scipy 的 csr_matrix，这样你能更好地理解这些对象（索引、指针、数据）以及它们的用法。

注意，上面的代码中我没有把索引减去 1。如果你想这样做，可以用 indices = numpy.array([1, 5, 3])-1。

回答于 2025-04-18 由 Python大师

分享举报

numpy 将向量转换为二进制矩阵

1 个回答

撰写回答