'Curator disconnects from zookeeper when IPs change even when using DNS names in connection string

We are using the curator service discovery in docker and kubernetes environments. We setup the connection string using the DNS names of the containers/pods. The problem I am seeing is that it seems to interpret these down to the IP address. The container or pod can change IP addresses and curator does not seem to pickup the change.

The behavior I see if it I standup a 3 node zookeeper cluster and stand up 1 or more agents. I then roll the zookeeper nodes 1 at a time and they each change their IP address, when I bounce the third zookeeper instance all the client lose their connection.

Is there a way to force it to always use the DNS names for connection?

Here is my compose example

version: '2.4'

x-zookeeper:
  &zookeeper-env
  JVMFLAGS: -Dzookeeper.4lw.commands.whitelist=ruok
  ZOO_ADMINSERVER_ENABLED: 'true'
  ZOO_STANDALONE_ENABLED: 'false'
  ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181

x-agent:
  &agent-env
  ZK_CONNECTION: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
  SERVICE_NAME: myservice

services:
  zookeeper1:
    image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
    restart: always
    ports:
      - 2181
      - 8080
    healthcheck:
      test: echo ruok | nc localhost 2181 | grep imok
      interval: 15s
    environment:
      <<: *zookeeper-env
      ZOO_MY_ID: 1

  zookeeper2:
    image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
    restart: always
    ports:
      - 2181
      - 8080
    healthcheck:
      test: echo ruok | nc localhost 2181 | grep imok
      interval: 15s
    environment:
      <<: *zookeeper-env
      ZOO_MY_ID: 2

  zookeeper3:
    image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
    restart: always
    ports:
      - 2181
      - 8080
    healthcheck:
      test: echo ruok | nc localhost 2181 | grep imok
      interval: 15s
    environment:
      <<: *zookeeper-env
      ZOO_MY_ID: 3

  agent1:
    image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
    environment:
      <<: *agent-env
      GLOBAL_ID: AGENT1

  agent2:
    image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
    environment:
      <<: *agent-env
      GLOBAL_ID: AGENT2

  agent3:
    image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
    environment:
      <<: *agent-env
      GLOBAL_ID: AGENT3

  agent4:
    image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
    environment:
      <<: *agent-env
      GLOBAL_ID: AGENT4

  agent5:
    image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
    environment:
      <<: *agent-env
      GLOBAL_ID: AGENT5

The run steps are

docker-compose up -d zookeeper1 zookeeper2 zookeeper3 agent1
docker-compose rm -sf zookeeper3
docker-compose up -d agent2
docker-compose up -d zookeeper3
docker-compose rm -sf zookeeper2
docker-compose up -d agent3
docker-compose up -d zookeeper2
docker-compose rm -sf zookeeper1
docker-compose up -d agent5
docker-compose up -d zookeeper1

After I kill the last zookeeper node the agent gets the following error and does not recover. You can see it is referencing an IP address

Path:null finished:false header:: 5923,4  replyHeader:: 5923,8589934594,0  request:: '/services/myservice/cc1996fb-cca5-4108-bd06-567b45f594d7,F  response:: #7b226e616d65223a226d7973657276696365222c226964223a2263633139393666622d636361352d343130382d626430362d353637623435663539346437222c2261646472657373223a223137322e32312e302e33222c22706f7274223a383038302c2273736c506f7274223a6e756c6c2c227061796c6f6164223a7b2240636c617373223a22636f6d2e7468696e67776f72782e646973636f766572792e7a6b2e53657276696365496e7374616e636544657461696c73222c2261747472696275746573223a7b22474c4f42414c4944223a224147454e5433227d7d2c22726567697374726174696f6e54696d65555443223a313634393739313735353936322c227365727669636554797065223a2244594e414d4943222c2275726953706563223a7b227061727473223a5b7b2276616c7565223a2261646472657373222c227661726961626c65223a747275657d2c7b2276616c7565223a223a222c227661726961626c65223a66616c73657d2c7b2276616c7565223a22706f7274222c227661726961626c65223a747275657d5d7d7d,s{4294967301,4294967301,1649791757073,1649791757073,0,0,0,144117976615550976,404,0,4294967301} 
agent1_1      | 19:48:46.438 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - ZooKeeper resolved addresses for service myservice: [ServiceDefinition [serviceName=myservice, host=172.21.0.7, port=8080, tags={GLOBALID=AGENT2}], ServiceDefinition [serviceName=myservice, host=172.21.0.4, port=8080, tags={GLOBALID=AGENT1}], ServiceDefinition [serviceName=myservice, host=172.21.0.3, port=8080, tags={GLOBALID=AGENT3}]]
agent1_1      | 19:48:47.070 [main-SendThread(172.21.0.5:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x200028941eb0001 for sever service-discovery-docker-tests_zookeeper2_1.service-discovery-docker-tests_default/172.21.0.5:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
agent1_1      | org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x200028941eb0001, likely server has closed socket
agent1_1      |     at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
agent1_1      |     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
agent1_1      |     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275)
agent1_1      | 19:48:47.171 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
agent1_1      | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] DEBUG org.apache.zookeeper.SaslServerPrincipal - Canonicalized address to 172.21.0.9
agent1_1      | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 172.21.0.9/172.21.0.9:2181.
agent1_1      | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
agent1_1      | 19:48:47.430 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - Getting registered addresses from ZooKeeper for service myservice

Zookeeper cluster is happy and fine. So the main question is there a way to have it use the DNS names instead of the IP addresses? Should also mention that service discovery uses ephemeral nodes so disconnect and reconnect is bad.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source