'Neural network for square (x^2) approximation

I made a simple module that should figure out the relationship between input and output numbers, in this case, x and x squared. The code in Python:

import numpy as np
import tensorflow as tf

# TensorFlow only log error messages.
tf.logging.set_verbosity(tf.logging.ERROR)

features = np.array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
                    9, 10], dtype = float)
labels = np.array([100, 81, 64, 49, 36, 25, 16, 9, 4, 1, 0, 1, 4, 9, 16, 25, 36, 49, 64,
                    81, 100], dtype = float)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(units = 1, input_shape = [1])
])

model.compile(loss = "mean_squared_error", optimizer = tf.keras.optimizers.Adam(0.0001))
model.fit(features, labels, epochs = 50000, verbose = False)
print(model.predict([4, 11, 20]))

I tried a different number of units, and adding more layers, and even using the relu activation function, but the results were always wrong. It works with other relationships like x and 2x. What is the problem here?



Solution 1:[1]

The problem is that x*x is a very different beast than a*x.

Please note what a usual "neural network" does: it stacks y = f(W*x + b) a few times, never multiplying x with itself. Therefore, you'll never get perfect reconstruction of x*x. Unless you set f(x) = x*x or similar.

What you can get is an approximation in the range of values presented during training (and perhaps a very little bit of extrapolation). Anyway, I'd recommend you to work with a smaller range of values, it will be easier to optimize the problem.

And on a philosophical note: In machine learning, I find it more useful to think of good/bad, rather than correct/wrong. Especially with regression, you cannot get the result "right" unless you have the exact model. In which case there is nothing to learn.


There actually are some NN architectures multiplying f(x) with g(x), most notably LSTMs and Highway networks. But even these have one or both of f(x), g(s) bounded (by logistic sigmoid or tanh), thus are unable to model x*x fully.


Since there is some misunderstanding expressed in comments, let me emphasize a few points:

  1. You can approximate your data.
  2. To do well in any sense, you do need a hidden layer.
  3. But no more data is necessary, though if you cover the space, the model will fit more closely, see desernaut's answer.

As an example, here is a result from a model with a single hidden layer of 10 units with tanh activation, trained by SGD with learning rate 1e-3 for 15k iterations to minimize the MSE of your data. Best of five runs:

Performance of a simple NN trained on OP's data

Here is the full code to reproduce the result. Unfortunately, I cannot install Keras/TF in my current environment, but I hope that the PyTorch code is accessible :-)

#!/usr/bin/env python
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

X = torch.tensor([range(-10,11)]).float().view(-1, 1)
Y = X*X

model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Tanh(),
    nn.Linear(10, 1)
)

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
loss_func = nn.MSELoss()
for _ in range(15000):
    optimizer.zero_grad()
    pred = model(X)
    loss = loss_func(pred, Y)
    loss.backward()
    optimizer.step()

x = torch.linspace(-12, 12, steps=200).view(-1, 1)
y = model(x)
f = x*x

plt.plot(x.detach().view(-1).numpy(), y.detach().view(-1).numpy(), 'r.', linestyle='None')
plt.plot(x.detach().view(-1).numpy(), f.detach().view(-1).numpy(), 'b')
plt.show()

Solution 2:[2]

My answer is a bit different. For the trivial case x*x, you can just write your own activation function that takes in x and outputs x*x. This answers the question above, "how to build a NN that calcuates x*x?". But this may violate the "spirit" of the question.

I mention this because sometimes you want to perform a non-trivial operation like
(x --> exp[A * x*x] * sinh[ 1/sqrt( log(k * x)) ] ).\ You could write an activation function for this, but the back propagation operation would be hellish and impenetrable to another developer.

AND suppose you also want the function
(x --> exp[A * x*x] * cosh[ 1/sqrt( log(k * x) ) ]).
Writing another stand-alone activation function would just be wasteful.

For this reason, you might want to build a library of activation functions with atomic operations like, z*z, exp(z), sinh(z), cosh(z), sqrt(z), log(z). These activation functions would be applied one at a time with the help of auxiliary network layers consisting of passthrough (i.e. no-op) nodes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2