'Flink Cluster startup Error : Could not resolve ResourceManager address akka
Need help with following error as I dont seem to find what is actual issue. I am trying to run flink cluster on docker-desktop in win10 professional.
Dockerfile:
FROM SOME-LOCAL-REGISTERY-URL/flink:1.11
ADD build/libs/demoapp-service-all.jar /opt/flink/usrlib/demoapp-service-all.jar
volume /tmp
ADD conf/flink-conf.yaml /opt/flink/conf/flink-conf.yaml
ADD conf/log4j.properties /opt/flink/conf/log4j.properties
flink-conf.yaml:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 8092
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
state.backend: rocksdb
state.checkpoints.dir: file:///c:/Users/demo/checkpoint_dir
state.backend.rocksdb.memory.managed: true
I am creating "demo/demoapp:1.0" image manually from Dockefile and then starting flink cluster as "docker-compose up"
docker-compose.yml:
version: "2.2"
services:
jobmanager:
image: demo/demoapp:1.0
ports:
- "8092:8092"
command: ["standalone-job", "-Dspring.profiles.active=dev"]
taskmanager:
image: demo/demoapp:1.0
depends_on:
- jobmanager
command: ["taskmanager", "-Dspring.profiles.active=dev"]
scale: 1
Logs:
jobmanager_1 | Starting Job Manager
taskmanager_1 | Starting Task Manager
jobmanager_1 | Starting standalonejob as a console application on host aaf9a34c154f.
taskmanager_1 | Starting taskexecutor as a console application on host a96dd08d9ae6.
---------------------------------------------------------------------------------------------
taskmanager_1 | TM_RESOURCE_PARAMS extraction logs:
taskmanager_1 | jvm_params: -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456
taskmanager_1 | dynamic_configs: -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.network.min=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.cpu.cores=2.0 -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.memory.task.off-heap.size=0b
taskmanager_1 | logs: INFO [] - Loading configuration property: jobmanager.rpc.address, a96dd08d9ae6
taskmanager_1 | INFO [] - Loading configuration property: jobmanager.rpc.port, 8092
taskmanager_1 | INFO [] - Loading configuration property: jobmanager.memory.process.size, 1600m
taskmanager_1 | INFO [] - Loading configuration property: taskmanager.memory.process.size, 1728m
taskmanager_1 | INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 2
taskmanager_1 | INFO [] - Loading configuration property: parallelism.default, 1
taskmanager_1 | INFO [] - Loading configuration property: state.backend, rocksdb
taskmanager_1 | INFO [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
taskmanager_1 | INFO [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
taskmanager_1 | INFO [] - Loading configuration property: blob.server.port, 6124
taskmanager_1 | INFO [] - Loading configuration property: query.server.port, 6125
--------------------------------------------------------------------------------------
jobmanager_1 | JM_RESOURCE_PARAMS extraction logs:
jobmanager_1 | jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
jobmanager_1 | logs: INFO [] - Loading configuration property: jobmanager.rpc.address, aaf9a34c154f
jobmanager_1 | INFO [] - Loading configuration property: jobmanager.rpc.port, 8092
jobmanager_1 | INFO [] - Loading configuration property: jobmanager.memory.process.size, 1600m
jobmanager_1 | INFO [] - Loading configuration property: taskmanager.memory.process.size, 1728m
jobmanager_1 | INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
jobmanager_1 | INFO [] - Loading configuration property: parallelism.default, 1
jobmanager_1 | INFO [] - Loading configuration property: state.backend, rocksdb
jobmanager_1 | INFO [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
jobmanager_1 | INFO [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
jobmanager_1 | INFO [] - Loading configuration property: blob.server.port, 6124
jobmanager_1 | INFO [] - Loading configuration property: query.server.port, 6125
---------------------------------------------------------------------------------------------
Error Logs:
taskmanager_1 | 2020-11-25 10:15:41,179 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Trying to connect to address a96dd08d9ae6/172.18.0.3:8092
taskmanager_1 | 2020-11-25 10:15:41,180 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address 'a96dd08d9ae6/172.18.0.3': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:15:41,181 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:15:41,181 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:15:41,182 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:15:41,183 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:15:41,183 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
taskmanager_1 | 2020-11-25 10:16:19,730 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor []
- Could not resolve ResourceManager address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*.
Also, apart from error, I dont understand from logs, why taskmanager is reading "jobmanager.rpc.address" and "taskmanager.numberOfTaskSlots" different from flink-conf.yaml. Whereas JobManager reads correctly.
Pls help on what I am missing here.
Solution 1:[1]
Instead of defining jobmanager.rpc.address
inside flink-conf.yaml
, defining it inside the docker-compose.yml
file solved the problem for me:
Dockerfile
:
FROM flink:1.12.2-scala_2.12-java8
COPY --chown=flink:flink ./path/to/assembly.jar /opt/flink/usrlib/
COPY --chown=flink:flink ./conf/* /opt/flink/conf/
docker-compose.yml
:
environment:
FLINK_PROPERTIES: |-
jobmanager.rpc.address: jobmanager
flink-conf.yaml
:
# Other configurations.
# ...
# Leave last line empty.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Berkay Öztürk |