更改代码以处理NumPy数组而不是datafram

2024-04-19 00:24:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要更改代码以使用NumPy2D数组而不是pandas数据帧:

df = pd.DataFrame(data=np.array([[nan, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["col1", "col2", "col3"])

list_of_NA_features = ["col1"]

for feature in list_of_NA_features:
    for index,row in df.iterrows():
        if (pd.isnull(row[feature]) == True):
            missing_value = 5 # for simplicity, let's put 5 instead of a function
            df.ix[index,feature] = missing_val

for index,row in df.iterrows():pd.isnull(row[feature]) == Truedf.ix[index,feature] = missing_val数组的正确方法是什么?你知道吗

这就是我目前所做的:

np_arr = df.as_matrix

for feature in list_of_NA_features:
    for feature in xrange(np_arr.shape[1]):
        # ???

如何获取行的索引以执行np_arr[irow,feature]?另外,给NumPy数组中的特定行和列赋值的正确方法是什么:df.ix[index,feature] = missing_val?你知道吗

更新

我通过删除函数fill_missing_values并用值5替换它来简化代码。然而,在我的实际案例中,我需要估计丢失的值。你知道吗


Tags: ofindfforindexnp数组feature
1条回答
网友
1楼 · 发布于 2024-04-19 00:24:46

设置

#setup a numpy array the same as your Dataframe
a = np.array([[np.nan,   2.,   3.],
       [  4.,   5.,   6.],
       [  7.,   8.,   9.]])

#list_of_NA_features now contains the column index in the numpy array
list_of_NA_features = [0]

解决方案:

#Now you can see how those operations can be carried out on a numpy array. I'm just saying you can do this on a numpy array in the way you did it on a Dataframe. I'm not saying this is the best way of doing what you are trying to do.
for feature in list_of_NA_features:
    for index, row in enumerate(a):
        if np.isnan(row[feature]):
            missing_value = 5
            a[index,feature] = missing_value 

Out[167]: 
array([[ 5.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

相关问题 更多 >