'Return confidence score with custom model for Vertex AI batch predictions

I uploaded a pretrained scikit learn classification model to Vertex AI and ran a batch prediction on 5 samples. It just returned a list of false predictions with no confidence score. I don't see anywhere in the SDK documentation or Google console for how to get batch predictions to include the confidence scores. Is that something Vertex AI can do?

My intent is to automate a batch prediction pipeline using the following code.

# Predict
# "csv", ""bigquery", "tf-record", "tf-record-gzip", or "file-list"
batch_prediction_job = model.batch_predict(
    job_display_name = job_display_name,
    gcs_source = input_path,
    instances_format = "", # jsonl, csv, bigquery, 
    gcs_destination_prefix = output_path,
    starting_replica_count = 1,
    max_replica_count = 10,
    sync = True,
)

batch_prediction_job.wait()

return batch_prediction_job.resource_name

I tried it out in google console as a test to make sure my input data was properly formatted.



Solution 1:[1]

I don't think so; the stock sklearn container provided by vertex doesn't provide such a score I guess. You might need to write a custom container.

Solution 2:[2]

You can now do this with the custom prediction routines. Here are a couple good e2e examples

Here's an example of the interface for the predictor.py:

%%writefile src/predictor.py
import joblib
import numpy as np
import pickle

from google.cloud import storage
from google.cloud.aiplatform.prediction.sklearn.predictor import SklearnPredictor
import json

class CprPredictor(SklearnPredictor):
    
    def __init__(self):
        return
    
    def load(self, gcs_artifacts_uri: str):
        """Loads the preprocessor artifacts."""
        gcs_client = storage.Client()
        with open("model.joblib", 'wb') as gcs_model:
            gcs_client.download_blob_to_file(
                gcs_artifacts_uri + "/model.joblib", gcs_model
            )

        with open("model.joblib", "rb") as f:
            self._model = joblib.load("model.joblib")

    
    def predict(self, instances):
        outputs = self._model.predict_proba(instances) 
        return outputs

Note you have to utilize an experimental branch of the SDK at the moment, will likely change to official.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shawn
Solution 2 JW_