我不明白为什么我的数据集中的列是NaN

2024-04-26 23:06:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图创建一个票价带(1/2/3)通过这个循环,但它似乎没有工作

traindf['FareBand'] = np.nan

for index, row in traindf.iterrows():
    if row['Fare'] <= 13.675550:
        row['FareBand'] = 1
    elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
        row['FareBand'] = 2
    else:
        row['FareBand'] = 3

Running.head()将显示fareband列下的所有行都是NaN

traindf.head(20)

Output:
       0    NaN
       1    NaN
       2    NaN
       3    NaN
       ...
       12   NaN
       13   NaN
       14   NaN
       15   NaN
       16   NaN
       17   NaN
       18   NaN
       19   NaN
       Name: FareBand, dtype: float64

原因是什么?你知道吗


Tags: inforindexifnpnanheadrow
3条回答

如果要使用所描述的方法,在循环内应用更改,则只需在特定索引位置设置数据帧行的值:

for index, row in traindf.iterrows():
    if row['Fare'] <= 13.675550:
        row['FareBand'] = 1
    elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
        row['FareBand'] = 2
    else:
        row['FareBand'] = 3
    traindf.loc[index] = row

我建议使用^{}

traindf = pd.DataFrame({'Fare':[10,15,3,30]})

m1 = traindf['Fare'] <= 13.675550
m2 = (traindf['Fare'] <= 20.662183) & (traindf['Fare'] > 13.675550)

traindf['FareBand'] = np.select([m1, m2], [1,2], 3)
print (traindf)
   Fare  FareBand
0    10         1
1    15         2
2     3         1
3    30         3

您的解决方案可能会更改按索引选择的值,但不要使用它,因为速度很慢:

for index, row in traindf.iterrows():
    if traindf.loc[index, 'Fare'] <= 13.675550:
        traindf.loc[index, 'FareBand'] = 1
    elif row['Fare'] <= 20.662183 and traindf.loc[index, 'Fare'] > 13.675550:
        traindf.loc[index, 'FareBand'] = 2
    else:
        traindf.loc[index, 'FareBand'] = 3

print (traindf)
   Fare  FareBand
0    10       1.0
1    15       2.0
2     3       1.0
3    30       3.0

您可以在不使用循环的情况下分三步完成此操作:

traindf['FareBand'] = 3
traindf.loc[traindf['Fare'].between(13.675550, 20.662183), 'FareBand'] = 2
traindf.loc[traindf['Fare'].le(13.675550), 'FareBand'] = 1

相关问题 更多 >