'Activation function on the hidden layers for Regression models in neural networks
I am trying to predict a single output value,y, using two input features. I read that regression models usually don't use any activation function, and even when applied they are mostly applied to the hidden layers. However, when I don't use or even use it only on the hidden layers, my predicted values are nowhere near the actual values.
This is my matlab function for calculating the loss function along with the backpropagation algorithm.
function [J grad] = nnCostFunction1(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
%Initialising the variables
m = size(X, 1);
X = [ones(m,1) X];
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
%feed forward
z_1 = X*Theta1';
A_1 = tanh(z_1);
A_1 = ([ones(m, 1) z_1]);
z_2 = (A_1*Theta2');
J = J + sum(((z_2 - y).^2),1);
J = J/(2*m);
%Regularizing the cost function
J = J + (lambda/(2*m))*(sum((sum((Theta1(:,2:size(Theta1,2)).^ 2),1)),2) + sum((sum((Theta2(:,2:size(Theta2,2)).^ 2),1)),2));
%Backpropagation
delta_3 = z_2-y;
delta_2 = (delta_3 * Theta2(:,2:end)).*tanhGradient(z_1);
size(delta_2);
Delta_1 = delta_2' * X;
Delta_2 = delta_3' * A_1;
Theta1_grad = Delta_1/m;
Theta2_grad = Delta_2/m;
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + (lambda/m)*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + (lambda/m)*Theta2(:,2:end);
grad = [Theta1_grad(:);Theta2_grad(:)];
end
This is my code for tanhgradient function:
function g = tanhGradient(z)
g = zeros(size(z));
g = 1 - tanh(z).^2;
This is how I'm implementing my learning algorithm.
clear
data = load("data.txt");
X = data(:,1:2);
y = data(:,3)
input_layer_size = 2;
hidden_layer_size = 16;
num_labels = 1;
%initialising the weights
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
%Learning the weights using fmincg
options = optimset('MaxIter', 100);
lambda = 1;
% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction1(p, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);
[nn_params, ~] = fmincg(costFunction, initial_nn_params, options);
My code for making the predictions:
function p = predict(Theta1, Theta2, X)
m = size(X, 1);
num_labels = size(Theta2, 1);
p = zeros(size(X, 1), 1);
h1 = tanh([ones(m, 1) X] * Theta1');
p = ([ones(m, 1) h1] * Theta2');
end
The output I'm getting is, First column is predicted values and second column is the actual values
My Dataset looks like this, First two columns contain my input features and the last column is my output. My dataset contains 850 examples
Solution 1:[1]
So a Linear Regression model doesn't use activation functions. They are in the form y = wi*xi + b with X and W being of the same size as your features (so 1 weight for every input feature). This is not a deep learning model.
But if you are talking about hidden layers, then you are probably talking about a deep learning model. If you are creating a dense linear neural network for a regression problem, you almost definitely want to use activation functions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | StBlaize |