'Which tensorflow method does decide to a particular batch of examples is for the model to learn?
I'm trying to understand the implementation of SGD in tensorflow.
I began with gradient_descent.py because of the file name.
Per keras doc, an optimizer needs to implement _resource_apply_dense
method, which corresponds with the code (partly) shown below:
def _resource_apply_dense(self, grad, var, apply_state=None):
var_device, var_dtype = var.device, var.dtype.base_dtype
coefficients = ((apply_state or {}).get((var_device, var_dtype))
or self._fallback_apply_state(var_device, var_dtype))
if self._momentum:
momentum_var = self.get_slot(var, "momentum")
return gen_training_ops.ResourceApplyKerasMomentum(
...
I'd like to know who passes the var
variable to the _resource_apply_dense
method? In other words, which method decides this particular batch of examples is for the model to learn?
Solution 1:[1]
Checking the optimizer_v2 or tensorflow keras, we find the only use of this function in the entire tensorflow codebase:
#...
def apply_grad_to_update_var(var, grad):
#...
if "apply_state" in self._dense_apply_args:
apply_kwargs["apply_state"] = apply_state
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
if var.constraint is not None:
with ops.control_dependencies([update_op]):
return var.assign(var.constraint(var))
We later see on that same file that the var
variable comes from an argument to the _distributed_apply
function:
#...
def _distributed_apply(self, distribution, grads_and_vars, name, apply_state):
#...
with name_scope_only_in_function_or_graph(name or self._name):
for grad, var in grads_and_vars:
#...
Finally, the grads_and_vars
argument is defined as List of (gradient, variable) pairs
in the function apply_gradients
:
#...
def apply_gradients(self,
grads_and_vars,
#...
"""...
Args:
grads_and_vars: List of (gradient, variable) pairs.
"""
If you check the occurrences of apply_gradients
(this search), you will see that it is a common way to update the weights of the network, and is thus controlled by the "update" step of the optimizer.
Solution 2:[2]
These are two different questions:
- The caller: "who passes the
var
variable to the_resource_apply_dense
method?" - Particular examples: "which method decides this particular batch of examples is for the model to learn?"
1. The caller
The main function that updates weights in any TensorFlow optimizer is apply_gradients
, and it receives a zip of trainable weights and their gradients. var
is a list of trainable weights unzipped in this line. From my understanding, here is the call sequence:
apply_gradients
calls_distributed_apply
._distributed_apply
, calls an innerapply_grad_to_update_var
.apply_grad_to_update_var
calls inherited and custom_resource_apply_dense
or_resource_apply_sparse
.
2. Particular examples
The decision on which examples are picked for a model to learn has nothing to do with the optimizer. Optimizers decide the amount in which the weights will be changed, it can be just the gradients, and it can be something more, and then they apply the change.
A batch is a subset of data. Thus, you can specify the data yourself or allow other classes to decide for you like Dataset class (please check shuffle
function).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ibarrond |
Solution 2 |