'Is there a way to use a kmeans, tensorflow saved model in bigquery?

I know this is kind of stupid since BigQueryML now provides Kmeans with good initialization. Nonetheless I was required to train a model in tensorflow and then pass it to BigQuery for prediction.

I saved my model and everything works fine, until I try to upload it to bigquery. I get the following error:

TensorFlow SavedModel output output has an unsupported shape: unknown_rank: true

So my question is: Is it impossible to use a tensorflow trained kmeans algorithm in BigQuery?

Edit:

Creating the model:

kmeans = tf.compat.v1.estimator.experimental.KMeans(num_clusters=8, use_mini_batch = False,    initial_clusters=KMEANS_PLUS_PLUS_INIT, seed=1234567890, relative_tolerance=.001)

Serving function:

def serving():
    inputs = {}
   # for feat in df.columns:
   #     inputs[feat] = tf.placeholder(shape=[None], dtype = tf.float32)
    inputs = tf.placeholder(shape=[None,9], dtype = tf.float32)
    return tf.estimator.export.TensorServingInputReceiver(inputs,inputs)

Saving the model:

kmeans.export_saved_model("gs://<bicket>/tf_clustering_model", 
                          serving_input_receiver_fn=serving,
                          checkpoint_path='/tmp/tmpdsleqpi3/model.ckpt-19',
                          experimental_mode=tf.estimator.ModeKeys.PREDICT)

Loading to BigQuery:

query="""
CREATE MODEL `<project>.<dataset>.kmeans_tensorflow` OPTIONS(MODEL_TYPE='TENSORFLOW', MODEL_PATH='gs://<bucket>/tf_clustering_model/1581439348/*')
"""
job = bq.Client().query(query)
job.result()

Edit2:

The output of the saved_model_cli command is the following:

jupyter@tensorflow-20200211-182636:~$ saved_model_cli  show --dir . --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['all_distances']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_FLOAT
        shape: unknown_rank
        name: add:0
  Method name is: tensorflow/serving/predict

signature_def['cluster_index']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: Squeeze_1:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 9)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: Squeeze_1:0
  Method name is: tensorflow/serving/predict

All seem to have unknown rank for the output shapes. How can I set up the export of this particular estimator or is there something I can search to help me?

Final Edit:

This really seems to be unsupported at least as far as I can take it. My approaches varied, but at the end of the day, I saw myself without much more choice than get the code from the source of the KmeansClustering class (and the remaining code from github) and attempt to reshape the outputs somehow. In the process, I realized the object of the results, was actually a tuple with some different Tensor class, that seemed to be used to construct the graphs alone. Interesting enough, if I took this tuple and did something like:

model_predictions[0][0]...[0]

the object was always some weird Tensor. I went up to sixty something in the three dots and eventually gave up.

From there I tried to get the class that was giving these outputs to KmeansClustering called Kmeans in clustering ops (and surrounding code in github). Again I had no success in changing the datatype, but I did understood why the name of the output was set to Squeeze something: in here the output had a squeeze operation. I thought this could be the problem and attempted to remove the squeeze operation among other things... I failed :(

Finally I realized that this output seemed to actually come from the estimator.py file and at this point I just gave up on it.

Thank you to all who commented, I would not have come this far, Cheers



Solution 1:[1]

You can check the shape in the savedmodel file by using the command line program saved_model_cli that ships with tensorflow.

Make sure your export signature in tensorflow specifies the shape of the output tensor.

Solution 2:[2]

What this error means: The TF model output named "output" is of completely undefined shape. (unknown_rank=true means that the model isn't even specifying a number of dimensions).

For BigQuery to be able to use the TensorFlow model it has to be able to convert the model output into a BigQuery type: Either a single primitive scalar or one-dimensional array of primitives.

You may be able to add a tf.reshape operation at the end of the graph to shape this output into something that BigQuery can load.

It's not obvious what your KMeans model is outputting. I'm guessing it might be trying to output all of the clusters as one big tensor? Was this a model created using the TensorFlow KMeans Estimator?

Solution 3:[3]

The main issue is that the output tensor shape of TF built-in KMeans estimator model has unknown rank in the saved model.

Two possible ways to solve this:

  • Try training the KMeans model on BQML directly.
  • Reimplement the TF KMeans estimator model to reshape the output tensor into a specific tensor shape.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lak
Solution 2 Chris Meyers
Solution 3