'Presto cluster + presto agent are restart each 1 min insted to be stable

we have Hadoop cluster that include presto , HIVE , HDFS , etc

we have in our presto production cluster 254 presto agent machines , and one presto coordinator

all presto services are installed on RHEL 7.6 machines

we have strange behavior that presto agents are restart each ~ 60 seconds , and until now we cant put the finger about the root cause

the log server.log looks as the following:

Configuration should be updated:

1) Configuration property 'hive.parquet.fail-on-corrupted-statistics' has been replaced. Use 'parquet.ignore-statistics' instead.
2) Configuration property 'hive.parquet.fail-on-corrupted-statistics' is deprecated and should not be used

==========
2022-02-17T17:33:02.684Z        WARN    http-client-node-manager-42     io.prestosql.metadata.RemoteNodeState   Error fetching node state from http://34.2.37.165:4444/v1/info/state: Server refused connection: http://34.2.37.165:4444/v1/info/state
2022-02-17T17:33:02.684Z        WARN    http-client-node-manager-40     io.prestosql.metadata.RemoteNodeState   Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-17T17:33:03.650Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Added catalog hive using connector hive-hadoop2 --
2022-02-17T17:33:03.650Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Loading catalog etc/catalog/jmx.properties --
2022-02-17T17:33:04.128Z        INFO    main    Bootstrap       PROPERTY         DEFAULT  RUNTIME                                                                  DESCRIPTION
2022-02-17T17:33:04.128Z        INFO    main    Bootstrap       jmx.dump-period  10.00s   10.00s
2022-02-17T17:33:04.128Z        INFO    main    Bootstrap       jmx.dump-tables  []       [java.lang:type=Runtime, presto.execution.scheduler:name=NodeScheduler]
2022-02-17T17:33:04.128Z        INFO    main    Bootstrap       jmx.max-entries  86400    86400
2022-02-17T17:33:04.355Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Added catalog jmx using connector jmx --
2022-02-17T17:33:04.355Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Loading catalog etc/catalog/memory.properties --
2022-02-17T17:33:04.762Z        INFO    main    Bootstrap       PROPERTY                              DEFAULT  RUNTIME  DESCRIPTION
2022-02-17T17:33:04.762Z        INFO    main    Bootstrap       memory.enable-lazy-dynamic-filtering  true     true
2022-02-17T17:33:04.762Z        INFO    main    Bootstrap       memory.max-data-per-node              128MB    4GB
2022-02-17T17:33:04.763Z        INFO    main    Bootstrap       memory.splits-per-node                20       20
2022-02-17T17:33:04.971Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Added catalog memory using connector memory --
2022-02-17T17:33:04.971Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Loading catalog etc/catalog/phoenix.properties --
2022-02-17T17:33:05.545Z        INFO    main    stderr  log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
2022-02-17T17:33:05.545Z        INFO    main    stderr  log4j:WARN Please initialize the log4j system properly.
2022-02-17T17:33:05.545Z        INFO    main    stderr  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       PROPERTY                                  DEFAULT  RUNTIME                                              DESCRIPTION
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       aggregation-pushdown.enabled              true     true                                                 Enable aggregation pushdown
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       allow-drop-table                          true     true                                                 Allow connector to drop tables
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       domain-compaction-threshold               32       32                                                   Maximum ranges to allow in a tuple domain without compacting it
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       unsupported-type-handling                 IGNORE   IGNORE                                               Unsupported type handling strategy
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       case-insensitive-name-matching            false    false
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       case-insensitive-name-matching.cache-ttl  1.00m    1.00m
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       phoenix.connection-url                    ----     jdbc:phoenix:master01:2181:/hbase-unsecure
2022-02-17T17:33:05.705Z        INFO    main    Bootstrap       phoenix.config.resources                  []       [/opt/mcspace/mass_hbase/hbase/conf/hbase-site.xml]
2022-02-17T17:33:06.368Z        INFO    main    io.airlift.bootstrap.LifeCycleManager   Life cycle starting...
2022-02-17T17:33:06.369Z        INFO    main    io.airlift.bootstrap.LifeCycleManager   Life cycle startup complete
2022-02-17T17:33:06.370Z        INFO    main    io.prestosql.metadata.StaticCatalogStore        -- Added catalog phoenix using connector phoenix --
2022-02-17T17:33:06.373Z        INFO    main    io.prestosql.security.AccessControlManager      Using system access control default
2022-02-17T17:33:06.407Z        INFO    main    io.prestosql.server.Server      ======== SERVER STARTED ========
2022-02-17T17:33:07.683Z        WARN    http-client-node-manager-39     io.prestosql.metadata.RemoteNodeState   Error fetching node state from http://34.2.37.165:4444/v1/info/state: Server refused connection: http://34.2.37.165:4444/v1/info/state
2022-02-17T17:33:07.686Z        INFO    node-state-poller-0     io.prestosql.metadata.DiscoveryNodeManager      Previously active node is missing: worker01 (last seen at 34.2.37.165)
2022-02-17T17:33:07.686Z        INFO    node-state-poller-0     io.prestosql.metadata.DiscoveryNodeManager      Previously active node is missing: worker04 (last seen at 34.2.37.240)
2022-02-17T17:33:07.687Z        WARN    http-client-node-manager-42     io.prestosql.metadata.RemoteNodeState   Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-17T17:33:22.101Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:33:22.101Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up

as we can see at the end presto agent is shutting down

and we can also see that:

grep "JVM is shutting down" server.log  | tail -20

grep "JVM is shutting down" server.log  | tail -20
2022-02-17T17:28:47.612Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:28:47.612Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:29:48.603Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:29:48.603Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:31:20.123Z        INFO    Thread-45       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:32:21.108Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:32:21.109Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:33:22.101Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:33:22.101Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:34:23.307Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:34:23.307Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:35:24.360Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:35:24.360Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:36:25.357Z        INFO    Thread-48       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:36:25.357Z        INFO    Thread-52       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:41:00.359Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:41:00.359Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:42:01.268Z        INFO    Thread-40       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:42:01.268Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up
2022-02-17T17:43:02.372Z        INFO    Thread-44       io.airlift.bootstrap.LifeCycleManager   JVM is shutting down, cleaning up

presto agent restart each 1 min

what could be the reason for this unstable presto agent?

also I must to say that server not have any problem of memory resources or full partitions usage



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source