'Locality LoadBalacing not working on Istio
We have a kubernetes cluster with ~100 nodes with istio and want to enable the Locality LoadBalancing feature. This will save us up to 70k USD/year because our interzone data traffic is too high.
I've followed the docs and setup the istio configmap like this:
...
meshNetworks: {}
localityLbSetting:
enabled: true
distribute:
- from: us-east-1/us-east-1a/*
to:
"us-east-1/us-east-1a/*": 100
- from: us-east-1/us-east-1b/*
to:
"us-east-1/us-east-1b/*": 100
...
And then deployed 2 apps, one of them just responds with the zone where the node is deployed (we are using a VirtualService) and the another one just do the requests.
The requests that are coming from node in us-east-1a should only be replied by the nodes in the same zone, right?
But it's not happening.
We also tried to set this variable inside pilot pods:
PILOT_ENABLE_LOCALITY_LOAD_BALANCING
When I get logs from one pod that is deployed in zone "us-east-1a" it shows replies from both zones.
Istio Version: 1.2.8
Kubernetes Version: 1.14
Any help is appreciated! Thank you!
Solution 1:[1]
I'm afraid your configuration is invalid in case of 'Locality'
weights between regions/zones in context of 'Locality Load Balancing'
feature in 'distribute'
mode.
The logs of your istio-pilot should give you a clue about it, in the form of warning similar to this one:
<timestamp> warn failed to read mesh configuration, using default: 1 error occurred:
* locality weight must not be in range [1, 100]
I don't think you can find it documented anywhere in Istio documentation, but the logic behind the weights' validation can be found here.
Solution 2:[2]
From @panicked's comment:
The pods where the requests are generated (src pods) have to belong to a K8s service themselves too, even if the service is not directly involved in the request.
As a side note, K8s recommends:
If the goal of the operator is not to distribute load across zones and regions but rather to restrict the regionality of failover to meet other operational requirements an operator can set a ‘failover’ policy instead of a ‘distribute’ policy.
but the distribute
seems to work just fine.
For failover
(and failoverPriority
) you must also have outlierDetection
defined.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nepomucen |
Solution 2 |