消除特征的特征化器

2024-05-14 15:55:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试设置一个Featureisers,它将删除除前10列之外的所有数据库列。数据库共有76列。这个想法是对我想保留的10列应用PolynomialFeatures(1)),但是我找不到一种方法来巧妙地删除剩下的66列(我想的是类似PolynomialFeatures(0))的东西,但它似乎不起作用。这个想法是将它们乘以常数0)。问题基本上是2:1)如何告诉DataFrameMapper将相同的特征化器应用于一系列列(即a_11到a_76);2) 如何告诉DataFrameMapper应用消除此类列的特征化器

到目前为止,我尝试的(不完整的)代码如下所示。我在问题1(即范围)中表示A_11-A_76,并表示为?准则中的问题2:

from dml_iv.utilities import SubsetWrapper, ConstantModel
from econml.sklearn_extensions.linear_model import StatsModelsLinearRegression

col = ["A_"+str(k) for k in range(XW.shape[1])]
XW_db = pd.DataFrame(XW, columns=col)

from sklearn_pandas import DataFrameMapper

subset_names = set(['A_0','A_1','A_2','A_3','A_4','A_5','A_6','A_7','A_8','A_9','A_10'])
# list of indices of features X to use in the final model

mapper = DataFrameMapper([
('A_0', PolynomialFeatures(1)),
('A_1', PolynomialFeatures(1)),
('A_2', PolynomialFeatures(1)),
('A_3', PolynomialFeatures(1)),
('A_4', PolynomialFeatures(1)),
('A_5', PolynomialFeatures(1)),
('A_11 - A_66', ?)]) ## PROBLEMATIC PART

Tags: of方法infromimport数据库modelcol
1条回答
网友
1楼 · 发布于 2024-05-14 15:55:22

为什么不从数据帧中删除不需要的列并映射剩下的内容

cols_map = [...] # list of columns to map
cols_drop = [...] # list of columns to drop
XW_db = XW_db.drop(cols_drop, axis=1) # you're left with only what to map
mapper = DataFrameMapper(cols_map)
...

如果不希望删除列的原因是以后将使用这些列,则可以简单地将删除结果分配给其他变量,从而创建几个更易于操作的子集数据帧:

df2 = df1.drop(cols_drop2,axis=1) # df2 is a subset of df1
df3 = df1.drop(cols_drop3,axis=1) # df3 is a subset of df1
# Alternative is to decide what to keep instead of what to drop
df4 = df1[cols_keep] # df4 is a subset of df1
# df1 remains the full dataframe    

相关问题 更多 >

    热门问题