'kafka deployment on strimzi
i'm trying to deploy kafka with strimzi, problem is, its exposing kafka brokers as load balancers and assigning them an external IP. i want kafka brokers to be available internally and exposed through a load balancer only. below is my deployment file.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: external
port: 9094
type: loadbalancer
tls: false
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: ephemeral
zookeeper:
replicas: 2
storage:
type: ephemeral
screenshot of cluster below
as you can see, there are 3 load balancers with external IP's assigned, whereas i wanted it to be one load balancer with an external IP and 2 kafka brokers.
Solution 1:[1]
Yes, this behavior is correct based on the Kafka discovery protocol. So first let's understand it-
- An authenticated Kafka client connects to any of the brokers during the first connection (this is being done using the Kubernetes service)
- The broker returns the metadata of one/more topics.
- After getting the details of the desired leader partition, the client opens up a new connection to that specific broker. Even if the client needs to connect to the first broker, it terminates the existing connection (#1) and starts a new one with that broker.
Now as we know that the Kafka client directly connects to the broker for sending/receiving the records, the load balancer only comes into the picture for the initial connection and it redirects the client to any one of the available Brokers. Now suppose, if we use the load balancer for subsequent connections as well, what would happen- the load balancer would connect the client to any of the available brokers which might or might not have the partition leader with which the client wants to connect. So Kafka handles this thing using the discovery protocol described above.
Solution 2:[2]
This is because of how Kafka is designed. The clients need to have direct access to each broker in the cluster. So the Load Balancer - while it is convenient to expose the cluster - does not really load-balance anything. It just routes the connection. You can find more details about how and why does it work like this for example in this blog post series: https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | AP. |
Solution 2 | Jakub |