'Deploying a Keycloak HA cluster to kubernetes | Pods are not discovering each other

I'm trying to deploy a HA Keycloak cluster (2 nodes) on Kubernetes (GKE). So far the cluster nodes (pods) are failing to discover each other in all the cases as of what I deduced from the logs. Where the pods initiate and the service is up but they fail to see other nodes.

Components

  • PostgreSQL DB deployment with a clusterIP service on the default port.
  • Keycloak Deployment of 2 nodes with the needed ports container ports 8080, 8443, a relevant clusterIP, and a service of type LoadBalancer to expose the service to the internet

Logs Snippet:

INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [keycloak-567575d6f8-c5s42|0] (1) [keycloak-567575d6f8-c5s42]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000079: Channel ejb local address is keycloak-567575d6f8-c5s42, physical addresses are [127.0.0.1:55200]
.
.
.
INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Keycloak 15.0.2 (WildFly Core 15.0.1.Final) started in 67547ms - Started 692 of 978 services (686 services are lazy, passive or on-demand)
INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://127.0.0.1:9990/management
INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://127.0.0.1:9990

And as we can see in the above logs the node sees itself as the only container/pod ID

Trying KUBE_PING protocol

I tried using the kubernetes.KUBE_PING protocol for discovery but it didn't work and the call to the kubernetes downward API. With a 403 Authorization error in the logs (BELOW IS PART OF IT):

Server returned HTTP response code: 403 for URL: https://[SERVER_IP]:443/api/v1/namespaces/default/pods

At this point, I was able to log in to the portal and do the changes but it was not yet an HA cluster since changes were not replicated and the session was not preserved, in other words, if I delete the pod that I was using I was redirected to the other with a new session (as if it was a separate node)

Trying DNS_PING protocol

When I tried DNS_PING things were different I had no Kubernetes downward API issues but I was not able to log in.

In detail, I was able to reach the login page normally, but when I enter my credentials and try logging in the page tries loading but gets me back to the login page with no logs in the pods in this regard.

Below are some of the references I resorted to over the past couple of days:

My Yaml Manifest files

Postgresql Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:13
          imagePullPolicy: IfNotPresent
          ports:
          - containerPort: 5432
          env:
            - name: POSTGRES_PASSWORD
              value: "postgres"
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432

Keycloak HA cluster Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: keycloak
  labels:
    app: keycloak
spec:
  replicas: 2 
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
    spec:
      containers:
      - name: keycloak
        image: jboss/keycloak
        env:
            - name: KEYCLOAK_USER 
              value: admin
            - name: KEYCLOAK_PASSWORD 
              value: admin123
            - name: DB_VENDOR
              value: POSTGRES
            - name: DB_ADDR
              value: "postgres" 
            - name: DB_PORT
              value: "5432"
            - name: DB_USER
              value: "postgres"
            - name: DB_PASSWORD
              value: "postgres"
            - name: DB_SCHEMA
              value: "public"
            - name: DB_DATABASE
              value: "keycloak"
#            - name: JGROUPS_DISCOVERY_PROTOCOL
#              value: kubernetes.KUBE_PING
#            - name: JGROUPS_DISCOVERY_PROPERTIES
#              value: dump_requests=true,port_range=0,namespace=default
#              value: port_range=0,dump_requests=true
            - name: JGROUPS_DISCOVERY_PROTOCOL
              value: dns.DNS_PING
            - name: JGROUPS_DISCOVERY_PROPERTIES
              value: "dns_query=keycloak"
            - name: CACHE_OWNERS_COUNT
              value: '2'
            - name: CACHE_OWNERS_AUTH_SESSIONS_COUNT
              value: '2'
            - name: PROXY_ADDRESS_FORWARDING
              value: "true"
        ports:
            - name: http
              containerPort: 8080
            - name: https
              containerPort: 8443

---
apiVersion: v1
kind: Service
metadata:
  name: keycloak
  labels:
    app: keycloak
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: https
      port: 443
      targetPort: 8443
  selector:
    app: keycloak
---
apiVersion: v1
kind: Service
metadata:
  name: keycloak-np
  labels:
    app: keycloak
spec:
  type: LoadBalancer 
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: https
      port: 443
      targetPort: 8443
  selector:
    app: keycloak

IMPORTANT NOTE

  • I tried both protocols with and without the database setup.
  • The above yaml has all the discovery protocol combinations I tried each at a time (the ones commented)


Solution 1:[1]

The way KUBE_PING works is similar to running kubectl get pods inside one Keycloak pod to find the other Keycloak pods' IPs and then trying to connect to them one by one. Except Keycloak does that by querying the Kubernetes API directly instead of running kubectl.

To do that, it needs credentials to query the API, basically an access token.

You can pass your token directly, if you have it, but its not very secure and not very convenient (you can check other options and behavior here).

Kubernetes have a very convenient way to inject a token to be used by a pod (or a software running inside that pod) to query the API. Check the documentation for a deeper look.

The mechanism is to create a service account, give it permissions to call the API using a RoleBinding and set that account in the pod configuration.

That works by mounting the token as a file at a known location, hardcoded and expected by all Kubernetes clients. When the client wants to call the API it looks for a token at that location.


Although not very convenient, you may be in the even more inconvenient situation of lacking permissions to create RoleBindings (somewhat common in more strict environments).

You can then ask an admin to create the service account and RoleBinding for you or just (very unsecurely) pass you own user's token (if you are capable of doing a kubectl get pod on Keycloak's namespace you have the permissions) via SA_TOKEN_FILE environment variable.

Create the file using a secret or configmap, mount it to the pod and set SA_TOKEN_FILE to that file location. Note that this method is specific to Keycloak.


If you do have permissions to create service accounts and RoleBindings in the cluster:

An example (not tested):

export TARGET_NAMESPACE=default

# convenient method to create a service account 
kubectl create serviceaccount keycloak-kubeping-service-account -n $TARGET_NAMESPACE

# No convenient method to create Role and RoleBindings
# Needed to explicitly define them.
cat <<EOF | kubectl apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: keycloak-kubeping-pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: keycloak-kubeping-api-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: keycloak-kubeping-pod-reader
subjects:
- kind: ServiceAccount
  name: keycloak-kubeping-service-account
  namespace: $TARGET_NAMESPACE

EOF

On the deployment, you set the serviceAccount:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: keycloak
spec:
  template:
    spec:
      serviceAccount: keycloak-kubeping-service-account
      serviceAccountName: keycloak-kubeping-service-account
      containers:
      - name: keycloak
        image: jboss/keycloak
        env:
#          ...
            - name: JGROUPS_DISCOVERY_PROTOCOL
              value: kubernetes.KUBE_PING
            - name: JGROUPS_DISCOVERY_PROPERTIES
              value: dump_requests=true
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
#          ...

dump_requests=true will help you debug Kubernetes requests. Better to have it false in production. You can use namespace=<yournamespace instead of KUBERNETES_NAMESPACE, but that's a handy way the pod has to autodetect the namespace it's running at.

Please note that KUBE_PING will find all pods in the namespace, not only keycloak pods, and will try to connect to all of them. Of course, if your other pods don't care about that, it's OK.

Solution 2:[2]

After long time with it, the best is using JDBC_PING, which fits also under a K8s environment. This procedure fits with Keycloak and a separated infinispan cluster too.

A basic approach can be found here https://github.com/thomasdarimont/keycloak-project-example/blob/main/deployments/local/cluster/haproxy-database-ispn/cli/0300-onstart-setup-ispn-jdbc-store.cli

What I suggest is generating a CLI script that run on start up or use the following env var. You'll need a database to persist data and the nodes will register there. It works in all environments.

  • JGROUPS_DISCOVERY_PROTOCOL=JDBC_PING

Feel free to use the repo I generated which contains the whole feature for a clustered environment under mysql https://github.com/albertoSoto/keycloak-infinispan-cluster

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Alberto Soto