'Deploying a Keycloak HA cluster to kubernetes | Pods are not discovering each other
I'm trying to deploy a HA Keycloak cluster (2 nodes) on Kubernetes (GKE). So far the cluster nodes (pods) are failing to discover each other in all the cases as of what I deduced from the logs. Where the pods initiate and the service is up but they fail to see other nodes.
Components
- PostgreSQL DB deployment with a clusterIP service on the default port.
- Keycloak Deployment of 2 nodes with the needed ports container ports 8080, 8443, a relevant clusterIP, and a service of type LoadBalancer to expose the service to the internet
Logs Snippet:
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000079: Channel ejb local address is keycloak-567575d6f8-c5s42, physical addresses are [127.0.0.1:55200]
.
.
.
INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Keycloak 15.0.2 (WildFly Core 15.0.1.Final) started in 67547ms - Started 692 of 978 services (686 services are lazy, passive or on-demand)
INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://127.0.0.1:9990/management
INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://127.0.0.1:9990
And as we can see in the above logs the node sees itself as the only container/pod ID
Trying KUBE_PING protocol
I tried using the kubernetes.KUBE_PING protocol for discovery but it didn't work and the call to the kubernetes downward API. With a 403 Authorization error in the logs (BELOW IS PART OF IT):
Server returned HTTP response code: 403 for URL: https://[SERVER_IP]:443/api/v1/namespaces/default/pods
At this point, I was able to log in to the portal and do the changes but it was not yet an HA cluster since changes were not replicated and the session was not preserved, in other words, if I delete the pod that I was using I was redirected to the other with a new session (as if it was a separate node)
Trying DNS_PING protocol
When I tried DNS_PING things were different I had no Kubernetes downward API issues but I was not able to log in.
In detail, I was able to reach the login page normally, but when I enter my credentials and try logging in the page tries loading but gets me back to the login page with no logs in the pods in this regard.
Below are some of the references I resorted to over the past couple of days:
- https://github.com/keycloak/keycloak-containers/blob/main/server/README.md#openshift-example-with-dnsdns_ping
- https://github.com/keycloak/keycloak-containers/blob/main/server/README.md#clustering
- https://www.youtube.com/watch?v=g8LVIr8KKSA
- https://www.keycloak.org/2019/05/keycloak-cluster-setup.html
- https://www.keycloak.org/docs/latest/server_installation/#creating-a-keycloak-custom-resource-on-kubernetes
My Yaml Manifest files
Postgresql Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5432
env:
- name: POSTGRES_PASSWORD
value: "postgres"
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
---
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
Keycloak HA cluster Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: keycloak
labels:
app: keycloak
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: keycloak
template:
metadata:
labels:
app: keycloak
spec:
containers:
- name: keycloak
image: jboss/keycloak
env:
- name: KEYCLOAK_USER
value: admin
- name: KEYCLOAK_PASSWORD
value: admin123
- name: DB_VENDOR
value: POSTGRES
- name: DB_ADDR
value: "postgres"
- name: DB_PORT
value: "5432"
- name: DB_USER
value: "postgres"
- name: DB_PASSWORD
value: "postgres"
- name: DB_SCHEMA
value: "public"
- name: DB_DATABASE
value: "keycloak"
# - name: JGROUPS_DISCOVERY_PROTOCOL
# value: kubernetes.KUBE_PING
# - name: JGROUPS_DISCOVERY_PROPERTIES
# value: dump_requests=true,port_range=0,namespace=default
# value: port_range=0,dump_requests=true
- name: JGROUPS_DISCOVERY_PROTOCOL
value: dns.DNS_PING
- name: JGROUPS_DISCOVERY_PROPERTIES
value: "dns_query=keycloak"
- name: CACHE_OWNERS_COUNT
value: '2'
- name: CACHE_OWNERS_AUTH_SESSIONS_COUNT
value: '2'
- name: PROXY_ADDRESS_FORWARDING
value: "true"
ports:
- name: http
containerPort: 8080
- name: https
containerPort: 8443
---
apiVersion: v1
kind: Service
metadata:
name: keycloak
labels:
app: keycloak
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8443
selector:
app: keycloak
---
apiVersion: v1
kind: Service
metadata:
name: keycloak-np
labels:
app: keycloak
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8443
selector:
app: keycloak
IMPORTANT NOTE
- I tried both protocols with and without the database setup.
- The above yaml has all the discovery protocol combinations I tried each at a time (the ones commented)
Solution 1:[1]
The way KUBE_PING works is similar to running kubectl get pods
inside one Keycloak pod to find the other Keycloak pods' IPs and then trying to connect to them one by one. Except Keycloak does that by querying the Kubernetes API directly instead of running kubectl
.
To do that, it needs credentials to query the API, basically an access token.
You can pass your token directly, if you have it, but its not very secure and not very convenient (you can check other options and behavior here).
Kubernetes have a very convenient way to inject a token to be used by a pod (or a software running inside that pod) to query the API. Check the documentation for a deeper look.
The mechanism is to create a service account, give it permissions to call the API using a RoleBinding and set that account in the pod configuration.
That works by mounting the token as a file at a known location, hardcoded and expected by all Kubernetes clients. When the client wants to call the API it looks for a token at that location.
Although not very convenient, you may be in the even more inconvenient situation of lacking permissions to create RoleBindings (somewhat common in more strict environments).
You can then ask an admin to create the service account and RoleBinding for you or just (very unsecurely) pass you own user's token (if you are capable of doing a kubectl get pod
on Keycloak's namespace you have the permissions) via SA_TOKEN_FILE
environment variable.
Create the file using a secret or configmap, mount it to the pod and set SA_TOKEN_FILE
to that file location. Note that this method is specific to Keycloak.
If you do have permissions to create service accounts and RoleBindings in the cluster:
An example (not tested):
export TARGET_NAMESPACE=default
# convenient method to create a service account
kubectl create serviceaccount keycloak-kubeping-service-account -n $TARGET_NAMESPACE
# No convenient method to create Role and RoleBindings
# Needed to explicitly define them.
cat <<EOF | kubectl apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: keycloak-kubeping-pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: keycloak-kubeping-api-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: keycloak-kubeping-pod-reader
subjects:
- kind: ServiceAccount
name: keycloak-kubeping-service-account
namespace: $TARGET_NAMESPACE
EOF
On the deployment, you set the serviceAccount:
apiVersion: apps/v1
kind: Deployment
metadata:
name: keycloak
spec:
template:
spec:
serviceAccount: keycloak-kubeping-service-account
serviceAccountName: keycloak-kubeping-service-account
containers:
- name: keycloak
image: jboss/keycloak
env:
# ...
- name: JGROUPS_DISCOVERY_PROTOCOL
value: kubernetes.KUBE_PING
- name: JGROUPS_DISCOVERY_PROPERTIES
value: dump_requests=true
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
# ...
dump_requests=true
will help you debug Kubernetes requests. Better to have it false
in production. You can use namespace=<yournamespace
instead of KUBERNETES_NAMESPACE
, but that's a handy way the pod has to autodetect the namespace it's running at.
Please note that KUBE_PING will find all pods in the namespace, not only keycloak pods, and will try to connect to all of them. Of course, if your other pods don't care about that, it's OK.
Solution 2:[2]
After long time with it, the best is using JDBC_PING, which fits also under a K8s environment. This procedure fits with Keycloak and a separated infinispan cluster too.
A basic approach can be found here https://github.com/thomasdarimont/keycloak-project-example/blob/main/deployments/local/cluster/haproxy-database-ispn/cli/0300-onstart-setup-ispn-jdbc-store.cli
What I suggest is generating a CLI script that run on start up or use the following env var. You'll need a database to persist data and the nodes will register there. It works in all environments.
- JGROUPS_DISCOVERY_PROTOCOL=JDBC_PING
Feel free to use the repo I generated which contains the whole feature for a clustered environment under mysql https://github.com/albertoSoto/keycloak-infinispan-cluster
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Alberto Soto |