高验证损失和均方误差

Question

我有一个非常大的能量波数据集，我正在用神经网络来练习，但我的均方误差（MSE）和验证损失（Val-loss）都非常高。我尝试使用相关矩阵，并进行了两步拆分，还用了三个隐藏层和正则化。结果显示，均方误差和验证损失的值高达12万亿。这是我的数据链接

# Load the data
perth_49 = pd.read_csv(r'WEC_Perth_49.csv')
sydney_49 = pd.read_csv(r'WEC_sydney_49.csv')
perth_100 = pd.read_csv(r'WEC_perth_100.csv')
sydney_100 = pd.read_csv(r'WEC_sydney_100.csv')

#merge the dataframes based on a common identifier
merged_data = pd.concat([perth_49, sydney_49, perth_100, sydney_100])

#define the target variable "total power otput"
target_variable = 'Total_Power'

# Defne the potential features 
features = [f'X{i}' for i in range(1, 101)] + [f'Y{i}' for i in range(1, 101)] + [f'Power{i}' for i in range(1, 101)] + ['qW']

# Compute the correlation matrix
correlation_matrix = merged_data[features + [target_variable]].corr()

# Sort the correlations with the target variable in descending order
correlation_with_target = correlation_matrix[target_variable].sort_values(ascending=False)

print("Correlation with target variable:")
print(correlation_with_target)

# Choose the top N features with the highest correlation
top_features = correlation_with_target.head(5).index.tolist()
top_features = top_features[1:5]  # Exclude the target variable itself

# Select the relevant features from the dataset
selected_data = merged_data[top_features + [target_variable]]

# Replace NaN values with 0 in selected_data
 #selected_data.fillna(0, inplace=True)

# Scale the features (normalizing)
selected_data[top_features] = scale(selected_data[top_features])

# Split the data into training and testing sets
X = selected_data[top_features].values
y = selected_data[target_variable].values.reshape(-1, 1)

# Split the data into training, validation, and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Print the sizes of each set
print("Training set size:", len(X_train))
print("Validation set size:", len(X_val))
print("Testing set size:", len(X_test))

# Scaling the features using MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

# Creating a neural network with Keras
#model = keras.Sequential([
#    layers.Dense(128, activation='relu', input_shape=(len(top_features),)),
#    layers.Dense(64, activation='relu'),
#    layers.Dense(32, activation='relu'),
#    layers.Dense(16, activation='relu'),
#    layers.Dense(1)  # a single output (total power)
#])
##########
from tensorflow.keras import regularizers

model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(len(top_features),), kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(1)  # a single output (total power)
])

# Compiling the model
optimizer = keras.optimizers.Adam(learning_rate=0.00001)
model.compile(optimizer=optimizer, loss='mean_squared_error')

# Early stopping
#early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights= True)

# Training the model
model.fit(X_train_scaled, y_train, epochs=4000, batch_size=32, validation_data=(X_val_scaled, y_val))

# Training the model with early stopping
#model.fit(X_train_scaled, y_train, epochs=4000, batch_size=32, validation_data=(X_val_scaled, y_val), callbacks=[early_stopping])


# Evaluating the model on the test set
loss = model.evaluate(X_test_scaled, y_test)
print("Test Loss:", loss)

# Making predictions
predictions = model.predict(X_test_scaled)

# Calculating mean squared error
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

神经网络正则化数据集相关矩阵验证损失均方误差隐藏层

高验证损失和均方误差

1 个回答

撰写回答