'Flink Cluster startup Error : Could not resolve ResourceManager address akka

Need help with following error as I dont seem to find what is actual issue. I am trying to run flink cluster on docker-desktop in win10 professional.

Dockerfile:

FROM SOME-LOCAL-REGISTERY-URL/flink:1.11
ADD build/libs/demoapp-service-all.jar /opt/flink/usrlib/demoapp-service-all.jar
volume /tmp
ADD conf/flink-conf.yaml /opt/flink/conf/flink-conf.yaml
ADD conf/log4j.properties /opt/flink/conf/log4j.properties

flink-conf.yaml:

jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 8092
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
state.backend: rocksdb
state.checkpoints.dir: file:///c:/Users/demo/checkpoint_dir
state.backend.rocksdb.memory.managed: true

I am creating "demo/demoapp:1.0" image manually from Dockefile and then starting flink cluster as "docker-compose up"

docker-compose.yml:

version: "2.2"
services:
  jobmanager:
    image: demo/demoapp:1.0
    ports:
      - "8092:8092"
    command: ["standalone-job", "-Dspring.profiles.active=dev"]


  taskmanager:
    image: demo/demoapp:1.0
    depends_on:
      - jobmanager
    command: ["taskmanager", "-Dspring.profiles.active=dev"]
    scale: 1

Logs:

jobmanager_1   | Starting Job Manager
taskmanager_1  | Starting Task Manager
jobmanager_1   | Starting standalonejob as a console application on host aaf9a34c154f.
taskmanager_1  | Starting taskexecutor as a console application on host a96dd08d9ae6.
---------------------------------------------------------------------------------------------
taskmanager_1  | TM_RESOURCE_PARAMS extraction logs:
taskmanager_1  | jvm_params: -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456
taskmanager_1  | dynamic_configs: -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.network.min=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.cpu.cores=2.0 -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.memory.task.off-heap.size=0b
taskmanager_1  | logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, a96dd08d9ae6
taskmanager_1  | INFO  [] - Loading configuration property: jobmanager.rpc.port, 8092
taskmanager_1  | INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
taskmanager_1  | INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
taskmanager_1  | INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 2
taskmanager_1  | INFO  [] - Loading configuration property: parallelism.default, 1
taskmanager_1  | INFO  [] - Loading configuration property: state.backend, rocksdb
taskmanager_1  | INFO  [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
taskmanager_1  | INFO  [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
taskmanager_1  | INFO  [] - Loading configuration property: blob.server.port, 6124
taskmanager_1  | INFO  [] - Loading configuration property: query.server.port, 6125
-------------------------------------------------------------------------------------- 
jobmanager_1   | JM_RESOURCE_PARAMS extraction logs:
jobmanager_1   | jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
jobmanager_1   | logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, aaf9a34c154f
jobmanager_1   | INFO  [] - Loading configuration property: jobmanager.rpc.port, 8092
jobmanager_1   | INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
jobmanager_1   | INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
jobmanager_1   | INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
jobmanager_1   | INFO  [] - Loading configuration property: parallelism.default, 1
jobmanager_1   | INFO  [] - Loading configuration property: state.backend, rocksdb
jobmanager_1   | INFO  [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
jobmanager_1   | INFO  [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
jobmanager_1   | INFO  [] - Loading configuration property: blob.server.port, 6124
jobmanager_1   | INFO  [] - Loading configuration property: query.server.port, 6125
---------------------------------------------------------------------------------------------

Error Logs:

taskmanager_1  | 2020-11-25 10:15:41,179 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Trying to connect to address a96dd08d9ae6/172.18.0.3:8092
taskmanager_1  | 2020-11-25 10:15:41,180 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'a96dd08d9ae6/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,181 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,181 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,182 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,183 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,183 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

taskmanager_1  | 2020-11-25 10:16:19,730 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] 
- Could not resolve ResourceManager address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*.

Also, apart from error, I dont understand from logs, why taskmanager is reading "jobmanager.rpc.address" and "taskmanager.numberOfTaskSlots" different from flink-conf.yaml. Whereas JobManager reads correctly.

Pls help on what I am missing here.



Solution 1:[1]

Instead of defining jobmanager.rpc.address inside flink-conf.yaml, defining it inside the docker-compose.yml file solved the problem for me:

Dockerfile:

FROM flink:1.12.2-scala_2.12-java8

COPY --chown=flink:flink ./path/to/assembly.jar /opt/flink/usrlib/

COPY --chown=flink:flink ./conf/* /opt/flink/conf/

docker-compose.yml:

    environment:
      FLINK_PROPERTIES: |-
        jobmanager.rpc.address: jobmanager

flink-conf.yaml:

# Other configurations.
# ...

# Leave last line empty.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Berkay Öztürk