'aws Sagemaker autoscaling with instance metrics per instance
I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html
The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings:
Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?
Solution 1:[1]
There is an InvocationsPerInstance metric that shows the average number of invocations per instance when you use the 'Sum' statistic.
https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html
This blog post details how you would go about load testing your endpoint to find a good target value for InvocationsPerInstance to use in autoscaling: https://aws.amazon.com/blogs/machine-learning/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling/
Solution 2:[2]
This blog post describes how you would define a custom metric to track average cpu utilisation per instance.
tl;dr
TargetTrackingScalingPolicyConfiguration={
'TargetValue': 90.0,
'CustomizedMetricSpecification':
{
'MetricName': 'CPUUtilization',
'Namespace': '/aws/sagemaker/Endpoints',
'Dimensions': [
{'Name': 'EndpointName', 'Value': endpoint_name },
{'Name': 'VariantName','Value': 'AllTraffic'}
],
'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
'Unit': 'Percent'
},
'ScaleInCooldown': 600,
'ScaleOutCooldown': 300
}
Solution 3:[3]
Yes, there is a way to find out "Metric per instance" and ack upon those.
This is done via Auto scaling policies. You have not used auto-scalling and I suggest to enable auto-scaling and start as low as possible with initial instance, like 1.
There is a aws documentation for the policies, so that is a nice start to understand the scaling based on metrics aws configure model autoscaling
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | fm1ch4 |
Solution 2 | trudolf |
Solution 3 | zhrist |