我想根据已知实例的概率分布和另一个属性的条件来填充缺失值。具体来说:
Weather_Conditions | Road_Surface | Date_Month
----------
Fine without high winds | NaN | 9
Fine without high winds | NaN | 1
Raining without high winds | Wet/Damp | 6
Fine without high winds | Wet/Damp | 1
Fine without high winds | NaN | 2
Fine without high winds | NaN | 1
Raining without high winds | Wet/Damp | 7
Raining without high winds | Wet/Damp | 1
如果月份是一月,则所有缺少的路面值应按1:3的比例填充霜:湿的。你知道吗
到目前为止,我成功地创建了一个要填充的值数组
road_values_jan = np.random.choice(["Frost/Ice", "Wet/Damp"], random_data["Road_Surface_Conditions"][random_data['Date_Month'].isin(["01"])].isnull().sum(), p=[0.25, 0.75])
# which outputs:
array(['Wet/Damp', 'Frost/Ice'], dtype='<U9')
当我希望它绑定到原始数据帧时,问题就来了。我试过了
null_road = random_data["Road_Surface_Conditions"][random_data['Date_Month'].isin(["01"])].isnull()
random_data.loc['null_road'] = np.random.choice(road_values_jan, road_values_jan.size)
来自this thread,但它说:ValueError:无法设置列不匹配的行
我还玩过
random_data["Road_Surface_Conditions"][random_data['Date_Month'].isin(["01"])] = random_data["Road_Surface_Conditions"][random_data['Date_Month'].isin(["01"])].fillna(pandas.Series(road_values_jan, index=random_data.index))
但是这个给了我ValueError:传递值的长度是2,索引意味着8
如何在月份条件下将这个二值数组附加到NaN值?你知道吗
请在下面找到.csv样式的数据:
Weather_Conditions,Road_Surface_Conditions,Date_Month
Fine without high winds,NaN,9
Fine without high winds,NaN,1
Raining without high winds,Wet/Damp,6
Fine without high winds,Wet/Damp,1
Fine without high winds,NaN,2
Fine without high winds,NaN,1
Raining without high winds,Wet/Damp,7
Raining without high winds,Wet/Damp,1
如果我理解正确,您可以首先创建一个分布为25:75的数组,其大小与
NaN
值相同,然后选择Road_Surface_Conditions
列中的NaN
行,并用创建的数组填充它们:注意我的数据帧被称为
df
,而不是random_data
相关问题 更多 >
编程相关推荐