Keras函数API:将可变长度列表传递到嵌入层

2024-06-12 09:55:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在使用Keras的函数API来构建一个神经网络,该网络融合了数字和分类特征。这里的怪癖是,每个训练样本可能都有一个分类变量的多个实例

因此,数据帧的示例可能如下所示:

        sessions_sum      sessions_duration      cat_var_list     score
0          -0.554354                    100            [0, 1]       1.0
1          -0.553925                    200         [0, 2, 4]       1.0
2          -0.548787                    100            [3, 4]       0.0
3          -0.554354                    100               [5]       0.0
4          -0.553069                    100            [2, 5]       1.0

cat_var_list列包含此培训样本的标签编码分类变量列表。我想创建一个嵌入层,它获取分类索引列表,单独嵌入它们,并在与密集层连接之前对嵌入进行平均

下面是正在进行的代码,该代码将数据转换为numpy数组并将其提供给模型

# Prep data
x_train_numerics = modelDf[['sessions_sum', 'sessions_duration']].values
x_train_cats = modelDf['cat_var_list'].values
y_train = model['score'].values

# Begin model constructio 
numerics = keras.layers.Input(shape=[input_size])
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(numerics)

cat_list = keras.layers.Input(shape=(None,), name = "subjectgroup_indices", dtype='int32')
embeddings = keras.layers.Embedding(input_dim=4, output_dim=10, input_length=None)(cat_list)
embeddings_avg = keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1))(embeddings)

hybrid_layer = keras.layers.Concatenate()([layer_1, embeddings_avg])
output_layer = keras.layers.Dense(1, kernel_initializer='lecun_uniform',
                                  name='output_layer')(hybrid_layer)
model = keras.models.Model(inputs=[numerics, cat_list], outputs=output_layer)
model.compile('adam', 'mean_absolute_error')
model.fit([x_train_numerics, x_train_cats], y_train, epochs=6, batch_size=200, validation_split=0.2)

当我运行fit方法时,这会导致以下错误:

Traceback (most recent call last):
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-63-377ace5b4cf7>", line 1, in <module>
    model.fit([x_train_numerics, x_train_sgs], y_train, epochs=6, batch_size=200, validation_split=0.2)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3277, in __call__
    dtype=tensor_type.as_numpy_dtype))
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

我已经尝试过按照对this问题的第一个答案的建议,将分类列表的输入形状设置为None,但没有效果。任何协助都将不胜感激。谢谢


Tags: inlayermodellayerslinetrainlistcat