如何使用numpy移除附加多维数组中的'None

1 投票

4 回答

2422 浏览

提问于 2025-04-16 02:26

我需要把一个csv文件里的数据导入到Python中的一个多维数组里，但我不太确定在把数据添加到空数组后，怎么把数组里的'None'值去掉。

我首先创建了这样的结构：

storecoeffs = numpy.empty((5,11), dtype='object')

这会返回一个5行11列的数组，里面全是'None'。

接下来，我打开了我的csv文件，并把它转换成了一个数组：

coeffsarray = list(csv.reader(open("file.csv")))

coeffsarray = numpy.array(coeffsarray, dtype='object')

然后，我把这两个数组合并在一起：

newmatrix = numpy.append(storecoeffs, coeffsarray, axis=1)

结果是一个数组，里面先是一些'None'值，后面才是我想要的数据（前两行给你一个大概的感觉）：

array([[None, None, None, None, None, None, None, None, None, None, None,
    workers, constant, hhsize, inc1, inc2, inc3, inc4, age1, age2,
    age3, age4],[None, None, None, None, None, None, None, None, None, None, None,
    w0, 7.334, -1.406, 2.823, 2.025, 0.5145, 0, -4.936, -5.054, -2.8, 0],,...]], dtype=object)

我该怎么把每一行中的'None'对象去掉，这样我就能得到一个包含我数据的5 x 11的多维数组呢？

numpy 数据清洗数据导入多维数组数组操作 csv文件数组合并 None值处理

4 个回答

从一个空数组开始吗？

storecoeffs = numpy.empty((5,0), dtype='object')

回答于 2025-04-16 由 Python大师

分享举报

为什么不直接使用 numpy.loadtxt() 呢：

newmatrix = numpy.loadtxt("file.csv", dtype='object')

如果我理解你的问题没错的话，这个方法应该可以解决你的问题。

回答于 2025-04-16 由 Python大师

分享举报

@Gnibbler的回答在技术上是正确的，但其实没必要一开始就创建storecoeffs这个数组。你只需要直接加载你的数据，然后从中创建一个数组就可以了。正如@Mermoz提到的，你的需求看起来简单得可以直接用numpy.loadtxt()来处理。

另外，为什么你要使用对象数组呢？这可能不是你想要的……现在，你存储的数值是以字符串的形式，而不是浮点数！

在numpy中，你基本上有两种方式来处理数据。如果你想方便地访问带名字的列，可以使用结构化数组（或者叫记录数组）。如果你想要一个“正常”的多维数组，那就直接用浮点数、整数等的数组。对象数组有特定的用途，但可能不适合你现在的情况。

举个例子：如果你只是想把数据加载为一个普通的二维numpy数组（假设你的数据都可以简单地表示为浮点数）：

import numpy as np
# Note that this ignores your column names, and attempts to 
# convert all values to a float...
data = np.loadtxt('input_filename.txt', delimiter=',', skiprows=1)

# Access the first column 
workers = data[:,0]

如果你想把数据加载为结构化数组，可以这样做：

import numpy as np
infile = file('input_filename.txt')

# Read in the names of the columns from the first row...
names = infile.next().strip().split()

# Make a dtype from these names...
dtype = {'names':names, 'formats':len(names)*[np.float]}

# Read the data in...
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# Note that data is now effectively 1-dimensional. To access a column,
# index it by name
workers = data['workers']

# Note that this is now one-dimensional... You can't treat it like a 2D array
data[1:10, 3:5] # <-- Raises an error!

data[1:10][['inc1', 'inc2']] # <-- Effectively the same thing, but works..

如果你的数据中有非数值的内容，并且想把它们当作字符串处理，你需要使用结构化数组，指定哪些字段要作为字符串，并设置字符串的最大长度。

从你的示例数据来看，第一列“workers”是一个非数值的内容，你可能想把它存储为字符串，而其他的看起来都是浮点数。在这种情况下，你可以这样做：

import numpy as np
infile = file('input_filename.txt')
names = infile.next().strip().split()

# Create the dtype... The 'S10' indicates a string field with a length of 10
dtype = {'names':names, 'formats':['S10'] + (len(names) - 1)*[np.float]}
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# The "workers" field is now a string array
print data['workers']

# Compare this to the other fields
print data['constant']

如果有些情况下你真的需要csv模块的灵活性（比如文本字段中有逗号），你可以用它来读取数据，然后再转换成合适的结构化数组。

希望这样能让事情变得更清楚一些……

回答于 2025-04-16 由 Python大师

分享举报

如何使用numpy移除附加多维数组中的'None

4 个回答

撰写回答