'Airflow + sqlalchemy short-lived connections to metadata db
I deployed the latest airflow on a centos 7.5 vm and updated sql_alchemy_conn
and result_backend
to postgres databases on a postgresql instance and designated my executor as CeleryExecutor
. Without enabling any dag at all and even with no airflow scheduler started, I see about one connection established every 5 seconds and then disposed to run a SELECT 1
and a SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
.
The number of short-lived connections drastically increase when one starts the scheduler and turns on dags. Does anyone know the reason for this? Is this a heartbeat check or task status check? With sql_alchemy_pool_enabled = True
in airflow.cfg should these connections not be longer lived? Is there a log that I can look to pinpoint the source of these connections with sub-second life?
Config values used for reference
executor = CeleryExecutor
sql_alchemy_conn = postgres://..../db1
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 5
sql_alchemy_max_overflow = 0
parallelism = 32
dag_concurrency = 16
max_active_runs_per_dag = 16
worker_concurrency = 16
broker_url = redis://...
result_backend = db+postgresql+psycopg2://.../db2
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
Solution 1:[1]
Set AIRFLOW__CORE__SQL_ALCHEMY_POOL_PRE_PING
to False
.
Check connection at the start of each connection pool checkout. Typically, this is a simple statement like SELECT 1
.
More information here: https://docs.sqlalchemy.org/en/13/core/pooling.html#disconnect-handling-pessimistic
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jeremy Caney |