'Airflow + sqlalchemy short-lived connections to metadata db

I deployed the latest airflow on a centos 7.5 vm and updated sql_alchemy_conn and result_backend to postgres databases on a postgresql instance and designated my executor as CeleryExecutor. Without enabling any dag at all and even with no airflow scheduler started, I see about one connection established every 5 seconds and then disposed to run a SELECT 1 and a SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1.

The number of short-lived connections drastically increase when one starts the scheduler and turns on dags. Does anyone know the reason for this? Is this a heartbeat check or task status check? With sql_alchemy_pool_enabled = True in airflow.cfg should these connections not be longer lived? Is there a log that I can look to pinpoint the source of these connections with sub-second life?

Config values used for reference

    executor = CeleryExecutor
    sql_alchemy_conn = postgres://..../db1
    sql_alchemy_pool_enabled = True
    sql_alchemy_pool_size = 5
    sql_alchemy_max_overflow = 0
    parallelism = 32
    dag_concurrency = 16
    max_active_runs_per_dag = 16
    worker_concurrency = 16
    broker_url = redis://...
    result_backend = db+postgresql+psycopg2://.../db2
    job_heartbeat_sec = 5
    scheduler_heartbeat_sec = 5


Solution 1:[1]

Set AIRFLOW__CORE__SQL_ALCHEMY_POOL_PRE_PING to False.

Check connection at the start of each connection pool checkout. Typically, this is a simple statement like SELECT 1.

More information here: https://docs.sqlalchemy.org/en/13/core/pooling.html#disconnect-handling-pessimistic

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeremy Caney