'Curator disconnects from zookeeper when IPs change even when using DNS names in connection string
We are using the curator service discovery in docker and kubernetes environments. We setup the connection string using the DNS names of the containers/pods. The problem I am seeing is that it seems to interpret these down to the IP address. The container or pod can change IP addresses and curator does not seem to pickup the change.
The behavior I see if it I standup a 3 node zookeeper cluster and stand up 1 or more agents. I then roll the zookeeper nodes 1 at a time and they each change their IP address, when I bounce the third zookeeper instance all the client lose their connection.
Is there a way to force it to always use the DNS names for connection?
Here is my compose example
version: '2.4'
x-zookeeper:
&zookeeper-env
JVMFLAGS: -Dzookeeper.4lw.commands.whitelist=ruok
ZOO_ADMINSERVER_ENABLED: 'true'
ZOO_STANDALONE_ENABLED: 'false'
ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
x-agent:
&agent-env
ZK_CONNECTION: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
SERVICE_NAME: myservice
services:
zookeeper1:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 1
zookeeper2:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 2
zookeeper3:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 3
agent1:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT1
agent2:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT2
agent3:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT3
agent4:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT4
agent5:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT5
The run steps are
docker-compose up -d zookeeper1 zookeeper2 zookeeper3 agent1
docker-compose rm -sf zookeeper3
docker-compose up -d agent2
docker-compose up -d zookeeper3
docker-compose rm -sf zookeeper2
docker-compose up -d agent3
docker-compose up -d zookeeper2
docker-compose rm -sf zookeeper1
docker-compose up -d agent5
docker-compose up -d zookeeper1
After I kill the last zookeeper node the agent gets the following error and does not recover. You can see it is referencing an IP address
Path:null finished:false header:: 5923,4 replyHeader:: 5923,8589934594,0 request:: '/services/myservice/cc1996fb-cca5-4108-bd06-567b45f594d7,F response:: #7b226e616d65223a226d7973657276696365222c226964223a2263633139393666622d636361352d343130382d626430362d353637623435663539346437222c2261646472657373223a223137322e32312e302e33222c22706f7274223a383038302c2273736c506f7274223a6e756c6c2c227061796c6f6164223a7b2240636c617373223a22636f6d2e7468696e67776f72782e646973636f766572792e7a6b2e53657276696365496e7374616e636544657461696c73222c2261747472696275746573223a7b22474c4f42414c4944223a224147454e5433227d7d2c22726567697374726174696f6e54696d65555443223a313634393739313735353936322c227365727669636554797065223a2244594e414d4943222c2275726953706563223a7b227061727473223a5b7b2276616c7565223a2261646472657373222c227661726961626c65223a747275657d2c7b2276616c7565223a223a222c227661726961626c65223a66616c73657d2c7b2276616c7565223a22706f7274222c227661726961626c65223a747275657d5d7d7d,s{4294967301,4294967301,1649791757073,1649791757073,0,0,0,144117976615550976,404,0,4294967301}
agent1_1 | 19:48:46.438 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - ZooKeeper resolved addresses for service myservice: [ServiceDefinition [serviceName=myservice, host=172.21.0.7, port=8080, tags={GLOBALID=AGENT2}], ServiceDefinition [serviceName=myservice, host=172.21.0.4, port=8080, tags={GLOBALID=AGENT1}], ServiceDefinition [serviceName=myservice, host=172.21.0.3, port=8080, tags={GLOBALID=AGENT3}]]
agent1_1 | 19:48:47.070 [main-SendThread(172.21.0.5:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x200028941eb0001 for sever service-discovery-docker-tests_zookeeper2_1.service-discovery-docker-tests_default/172.21.0.5:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
agent1_1 | org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x200028941eb0001, likely server has closed socket
agent1_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
agent1_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
agent1_1 | at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275)
agent1_1 | 19:48:47.171 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] DEBUG org.apache.zookeeper.SaslServerPrincipal - Canonicalized address to 172.21.0.9
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 172.21.0.9/172.21.0.9:2181.
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
agent1_1 | 19:48:47.430 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - Getting registered addresses from ZooKeeper for service myservice
Zookeeper cluster is happy and fine. So the main question is there a way to have it use the DNS names instead of the IP addresses? Should also mention that service discovery uses ephemeral nodes so disconnect and reconnect is bad.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|