'tflite: get_tensor on non-output tensors gives random values

I'm trying to debug my tflite model, that uses custom ops. I've found the correspondence between op names (in *.pb) and op ids (in *.tflite), and I'm doing a layer-per-layer comparison (to make sure the outputs difference are always in range 1e-4 (since it blows up at the end, I want to find the exact place where my custom layer fails) as follows:


Method 1: I use get_tensor to get the output as follows:

from tensorflow.contrib.lite.python import interpreter

# load the model
model = interpreter.Interpreter(model_path='model.tflite')
model.allocate_tensors()

# get tensors
for i in tensor_ids:
    tensor_output[i] = model.get_tensor(i)

It show totally inadequate random values (comparing to the outputs of the TensorFlow model).


Method 2: convert the *.pb only up to a certain layer, then repeat, basically:

  1. Create a *.pb so that it contains the network only from input up to layer_1.

  2. Convert to tflite (so the output is now layer_1) and check the outputs of TF-Lite with TensorFlow.

  3. Repeat steps 1-2 for layer_2, layer_3, ... outputs.

This method requires much more work and executions, but it correctly shows that for built-in operations the outputs of tflite and pb models were identical, and only starts to differ in my custom ops (while in Method 1 the outputs diverges right away from first layers).


Question: Why the behaviour of get_tensor is so strange? Maybe it is because I am using tensorflow 1.9(when TF-Lite was still not released and available only in developer preview)?

PS: I am aware about the release of TF-Lite, but I've manually compiled TensorFlow 1.9 for my project and now it is hard to change the versioning.



Solution 1:[1]

I had the same problem few month ago. The thing is, TF-Lite is completely different from TensorFlow – it uses static memory and execution plans, memory mapping files for faster loading, and it is supposed to optimize everything possible in the network's forward propagation pipeline.

I'm not a developer of TF-Lite, but I suppose it keeps its memory footprint extremely low by re-using the memory areas that were used for previously computed ops. Let's see the idea on following illustration:


Step 1: first, we're feeding the inputs to a symbolic tensor I (in parentheses). Let's say the value of it is stored in a buffer called buffer_1.

     op1       op2       op3
(I) ---->  A  ---->  B  ---->  O
_________________________________
^^^        ^^^^^^^^^^^^       ^^^
input      intermediate    output
tensor     tensors         tensor

Step 2: Now, we need to compute op1 on symbolic tensor I to attain the symbolic tensor A. We compute on buffer_1 and store the value of symbolic tensor A in a buffer called buffer_2.

    [op1]      op2       op3
(I) ----> (A) ---->  B  ---->  O

Step 3: Now, we're computing op2 on symbolic tensor A to attain the symbolic tensor B. We compute on buffer_2 and store the value of symbolic tensor B in a buffer called buffer_3...

     op1      [op2]      op3
 I  ----> (A) ----> (B) ---->  O

But wait! Why waste our memory to store in buffer_3 if we now have buffer_1 that is unused, and the value of which is now useless for getting the output O? So, instead of storing in buffer_3, we will actually store results of this operation in buffer_1!

That's the basic idea of efficient memory re-usage, which I think is implemented in TF-Lite, given its built-in static graph analyzer in toco and other stuffs. And that's why you can't simply apply get_tensor on non-output tensors.


An easier way to debug?

You've mentioned that you're writing a custom op, so I suppose you've built tflite with bazel, right? Then you can actually inject some logging code to Interpreter::Invoke() in the file tensorflow/lite/interpreter.cc. An ugly hack, but it works.

PS: I would be glad if any TensorFlow Lite developers come across and give a comment on this :)

Solution 2:[2]

Karim's answer

interpreter = tf.lite.Interpreter( model_path="test.tflite", experimental_preserve_all_tensors=True)

is the most straightforward solution on this, but you have to be on tensorflow>=2.5.0.

Solution 3:[3]

By default TFLite doesn't preserve intermediate tensors this is because it optimizes memory usage and reuse allocated memory of a tensor based on the data flow dependency. You can use the newly added debugging feature to preserve all tensors

interpreter = tf.lite.Interpreter(
    model_path="test.tflite",
    experimental_preserve_all_tensors=True)

Now you can inspect intermediate tensors on this interpreter.

Solution 4:[4]

Yes intermediate tensors can be overwritten unless specified as outputs.

Edit: I managed to fix the problem by making all ops be in the output list during conversion. They are then preserved at runtime and the values can be read correctly.

See:

Obtaining quantized activations in tensorflow lite

Solution 5:[5]

I faced a similar issue in wanting to convert a TFLite file to another framework, without access to the original TF graph that was used to make the TFLite file. Because the output from my conversion was different than the output from the TFLite model, I wanted to look at the output from intermediate layers. Thanks to this topic on SO, I learned that get_tensor() isn't a reliable approach.

The easiest solution was to edit the TFLite file in a hex editor!

The output of the model is the index of one of the tensors in the model. In my case that was tensor 175 (you can see this with get_tensor_details(). This is stored as a little-endian int32 somewhere in the TFLite file. For tensor 175, the TFLite value will contain a value 0xAF000000.

I wanted the model output to use tensor 3 instead, so I opened the TFLite file in a hex editor, did a search for 0xAF000000, and replaced it with 0x03000000. Saved the file and loaded it again with the TFLite interpreter. Works like a charm. You just have to be careful that the file may contain more than one occurrence of 0xAF000000 (or whatever you're looking for). In my TFLite file it was stored near the end.

I hope this tip is useful to someone. :-)

Solution 6:[6]

We cannot directly get intermediate inputs and outputs from a TFlite model. But, we can get inputs and outputs of layers by modifying the model buffer. This repo shows how it is done. We need to modify flat buffer schema for this to work. The modified TFlite schema (tflite folder in the repo) is available in the repo.

For the completeness of the answer, below is the relevant code:

def buffer_change_output_tensor_to(model_buffer, new_tensor_i):
    # from https://github.com/raymond-li/tflite_tensor_outputter
    # Set subgraph 0's output(s) to new_tensor_i
    # Reads model_buffer as a proper flatbuffer file and gets the offset programatically
    # It might be much more efficient if Model.subgraphs[0].outputs[] was set to a list of all the tensor indices.
    fb_model_root = tflite_model.Model.GetRootAsModel(model_buffer, 0)
    output_tensor_index_offset = fb_model_root.Subgraphs(0).OutputsOffset(0) # Custom added function to return the file offset to this vector
    # print("buffer_change_output_tensor_to. output_tensor_index_offset: ")
    # print(output_tensor_index_offset)
    # output_tensor_index_offset = 0x5ae07e0 # address offset specific to inception_v3.tflite
    # output_tensor_index_offset = 0x16C5A5c # address offset specific to inception_v3_quant.tflite
    # Flatbuffer scalars are stored in little-endian.
    new_tensor_i_bytes = bytes([
        new_tensor_i & 0x000000FF, \
        (new_tensor_i & 0x0000FF00) >> 8, \
        (new_tensor_i & 0x00FF0000) >> 16, \
        (new_tensor_i & 0xFF000000) >> 24 \
    ])
    # Replace the 4 bytes corresponding to the first output tensor index
    return model_buffer[:output_tensor_index_offset] + new_tensor_i_bytes + model_buffer[output_tensor_index_offset + 4:]

def get_tensor(path_tflite, tensor_id):
    with open(path_tflite, 'rb') as fp:
        model_buffer = fp.read()
    
    model_buffer = buffer_change_output_tensor_to(model_buffer, int(tensor_id))
    interpreter = tf.lite.Interpreter(model_content=model_buffer)
    interpreter.allocate_tensors()
    tensor_details = interpreter._get_tensor_details(tensor_id)
    tensor_name = tensor_details['name']
    
    input_details = interpreter.get_input_details()
    interpreter.set_tensor(input_details[0]['index'], input_tensor)
    interpreter.invoke()
    
    tensor = interpreter.get_tensor(tensor_id)
    return tensor

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 hav4ik
Solution 2 Chenster Liu
Solution 3 Karim Nosseir
Solution 4
Solution 5 Matthijs Hollemans
Solution 6 tpk