'Invocation timed out using Sagemaker to invoke endpoints with pretrained custom PyTorch model [Inference]
I have a pretrained model based on PyTorch (contextualized_topic_models) and have deployed it using AWS sagemaker script model. However, when I tried to invoke endpoints for inference, it always returns "Invocation timed out error" no matter what I tried. I have tried different types of input and changing the input_fn() function but still it doesn't work.
I've tried to run my inference.py script on Colab (without connecting to the aws server) and each function seems to work perfectly fine with expected predictions returned.
I've been trying to debug this for 4 days now and even in my dream I thought about this issue... I'll be deeply grateful for any help.
Here's my deployment script.
from sagemaker.pytorch.model import PyTorchModel
pytorch_model = PyTorchModel(
model_data=pretrained_model_data,
entry_point="inference.py",
role=role,
framework_version="1.8.1",
py_version="py36",
sagemaker_session=sess,
)
endpoint_name = "topic-modeling-inference"
# Deploy
predictor = pytorch_model.deploy(
initial_instance_count = 1,
instance_type = "ml.g4dn.xlarge",
endpoint_name = endpoint_name
)
Endpoint test (prediction) script
# Test the model
import json
sm = boto3.client('sagemaker-runtime')
endpoint_name = "topic-modeling-inference"
prompt = [
"Here is a piece of cake."
]
promptbody = [x.encode('utf-8') for x in prompt]
promptbody = promptbody[0]
#body= bytes(prompt[0], 'utf-8')
#tryout = prompt[0]
response = sm.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Body=promptbody
#Body=tryout.encode(encoding='UTF-8')
)
print(response)
#result = json.loads(response['Body'].read().decode('utf-8'))
#print(result)
Part of my inference.py script
def predict_fn(input_data, model):
input_data_features = tp10.transform(text_for_contextual=input_data)
topic_prediction = model.get_doc_topic_distribution(input_data_features, n_samples=20)
topicID = np.argmax(topic_prediction)
topicID = int(topicID.astype('str'))
return topicID
#prediction = model.get_topic_lists(20)[np.argmax(topic_prediction)]
#return prediction
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
request = json.loads(request_body)
else:
request = request_body
return request
def output_fn(prediction, response_content_type):
if response_content_type == "application/json":
response = str(json.dumps(prediction))
else:
response = str(json.dumps(prediction))
return response
Any help or guidance will be wonderful. Thank you in advance.
Solution 1:[1]
I would suggest to look into the CloudWatch logs of the endpoint to see if there are any invocations reaching the endpoint.
If yes, see if they are sending a response back without any errors in the same log file.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | CrzyFella |