在Twitter情绪分析上用python的LIME text explainer解释我的深度学习模型

2024-05-29 05:24:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个带有情感标签的推特数据集。我已经对数据进行了预处理,并完成了词性标记(都是通过python中的NLTK)。预处理后,数据如下所示:

Pre-processed tweets

预处理后,使用以下代码准备训练数据:

full_text = list(train['content'].values) + list(test['content'].values) tokenizer = Tokenizer(num_words=20000,lower = True, filters = '') tokenizer.fit_on_texts(full_text) train_tokenized = tokenizer.texts_to_sequences(train['content']) test_tokenized = tokenizer.texts_to_sequences(test['content']) max_len = 50 X_train = pad_sequences(train_tokenized, maxlen = max_len) X_test = pad_sequences(test_tokenized, maxlen = max_len) embed_size = 300 max_features = 20000 def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32') def get_embed_mat(embedding_path): embedding_index = dict(get_coefs(*o.strip().split(" ")) for o in open(embedding_path,encoding="utf8")) word_index = tokenizer.word_index nb_words = min(max_features, len(word_index)) print(nb_words) embedding_matrix = np.zeros((nb_words + 1, embed_size)) for word, i in word_index.items(): if i >= max_features: continue embedding_vector = embedding_index.get(word) if embedding_vector is not None: embedding_matrix[i] = embedding_vector return embedding_matrix

建立了一个以单词嵌入为层的深度学习模型。模型建造规范如下所示:

def build_model1(lr = 0.0, lr_d = 0.0, units = 0, dr = 0.0): inp = Input(shape = (max_len,)) x = Embedding(20001, embed_size, weights = [embedding_matrix], trainable = False)(inp) x1 = SpatialDropout1D(dr)(x) x_lstm = Bidirectional(LSTM(units, return_sequences = True))(x1) x1 = Conv1D(32, kernel_size=2, padding='valid', kernel_initializer='he_uniform')(x_lstm) avg_pool1_lstm1 = GlobalAveragePooling1D()(x1) max_pool1_lstm1 = GlobalMaxPooling1D()(x1) x_lstm = Bidirectional(LSTM(units, return_sequences = True))(x1) x1 = Conv1D(32, kernel_size=2, padding='valid', kernel_initializer='he_uniform')(x_lstm) avg_pool1_lstm = GlobalAveragePooling1D()(x1) max_pool1_lstm = GlobalMaxPooling1D()(x1) x = concatenate([avg_pool1_lstm1, max_pool1_lstm1, avg_pool1_lstm, max_pool1_lstm]) #x = BatchNormalization()(x) x = Dropout(0.1)(Dense(128,activation='relu') (x)) x = BatchNormalization()(x) x = Dropout(0.1)(Dense(64,activation='relu') (x)) x = Dense(8, activation = "sigmoid")(x) model = Model(inputs = inp, outputs = x) model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = lr, decay = lr_d), metrics = ["accuracy"]) history = model.fit(X_train, y_one_hot, batch_size = 128, epochs = 20, validation_split=0.1, verbose = 1, callbacks = [check_point, early_stop]) model = load_model(file_path) return model

我想用石灰来解释这个模型的预测(如下图所示)。但它不起作用。

Lime Text explanation of Model


Tags: testsizeindexmodellenreturntrainembedding

热门问题