'How do I include the bias term with other weights when performing gradient descent in TensorFlow?
I'm a beginner with ML and have been following the Coursera intro syllabus. I am trying to implement the exercises using TensorFlow rather than Octave.
I have three versions - the first two work fine and the third doesn't. I would like to know why.
Note - I am using TensorFlow.Net in F#, but the binding is a 1:1 mapping of the API so it should look pretty familiar to Python devs.
1.) Completely manual, works fine
let gradientDescent (x : Tensor) (y : Tensor) (alpha : Tensor) iters =
let update (theta : Tensor) =
let delta =
let h = tf.matmul(x,theta)
let errors = h - y
let s = tf.matmul((tf.transpose x), errors)
s / m
theta - alpha * delta
let rec search (theta : Tensor) i =
if i = 0 then
theta
else
search (update theta) (i - 1)
let initTheta = tf.constant([| 0.; 0. |], shape = Shape [| 2; 1 |])
search initTheta iters
let ones = tf.ones(Shape [| m; 1 |], dtype = TF_DataType.TF_DOUBLE)
let X = tf.concat([ ones; x ], axis = 1)
let theta = gradientDescent X y alpha iterations
2.) Using Gradient Tape for auto differentiation with a separate bias term - works fine also
let gradientDescent (x : Tensor) (y : Tensor) (alpha : float32) iters =
let W = tf.Variable(0.f, name = "weight")
let b = tf.Variable(0.f, name = "bias")
let optimizer = keras.optimizers.SGD alpha
for _ in 0 .. iters do
use g = tf.GradientTape()
let h = W * x + b
let loss = tf.reduce_sum(tf.pow(h - y,2)) / (2 * m)
let gradients = g.gradient(loss, struct (b,W))
optimizer.apply_gradients(zip(gradients, struct (b,W)))
tf
.constant(value = [| b.value(); W.value() |], shape = Shape [| 2; 1 |])
.numpy()
.astype(TF_DataType.TF_DOUBLE)
let theta = gradientDescent x y alpha iterations
3.) Using Gradient Tape as before, this time including the bias term as part of the weights - this throws a stack overflow exception when calling apply_gradients
.
let gradientDescent (x : Tensor) (y : Tensor) (alpha : float32) iters =
let W = tf.Variable(tf.constant([| 0.; 0. |], shape = Shape [| 2; 1 |]))
let optimizer = keras.optimizers.SGD alpha
for _ in 0 .. iters do
use g = tf.GradientTape()
let h = tf.matmul(x, W)
let loss = tf.reduce_sum(tf.pow(h - y,2)) / (2 * m)
let gradients = g.gradient(loss, W) // correct gradient tensor returned here
optimizer.apply_gradients(struct(gradients, W)) // boom!
tf
.constant(value = W.value().ToArray<double>(), shape = Shape [| 2; 1 |])
.numpy()
let ones = tf.ones(Shape [| m; 1 |], dtype = TF_DataType.TF_DOUBLE)
let X = tf.concat([ ones; x ], axis = 1)
let theta = gradientDescent X y alpha iterations
Solution 1:[1]
I worked it out - optimizer.apply_gradients
requires an iterable.
All I had to do was change
optimizer.apply_gradients(struct(gradients, W))
to
optimizer.apply_gradients(zip([|gradients|], [|W|]))
plus a bit of float32 / 64 casting
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ryan |