'Airflow DAGS Orchestration
I have three DAGs (say, DAG1, DAG2 and DAG3). I have a monthly scheduler for DAG1. DAG2 and DAG3 must not be run directly (no scheduler for these) and must be run only when DAG1 is completed successfully. That is, once DAG1 is complete, DAG2 and DAG3 will need to start in parallel.
What is the best mechanism to do this? I came across TriggerDAGRun and ExternalTaskSensor options. I am wanting to understand the pros and cons of each and which one is the best. I see few questions around these. However, I am trying to find the answer for the latest stable Airflow version.
Solution 1:[1]
ExternalTaskSensor
is not relevant for your use case as none of the DAGs you mention needs to wait for another DAG.
You need to set TriggerDagRunOperator
at the code of DAG1 that will trigger the DAG runs for DAG2, DAG3.
A skeleton of the solution would be:
dag2 = DAG(dag_id="DAG2", schedule_inteval=None)
dag3 = DAG(dag_id="DAG3", schedule_inteval=None)
with DAG(dag_id="DAG1", schedule_inteval="@monthly") as dag1:
op_first = DummyOperator(task_id="first") #Replace with operators of your DAG
op_trig2 = TriggerDagRunOperator(task_id="trigger_dag2", trigger_dag_id="DAG2")
op_trig3 = TriggerDagRunOperator(task_id="trigger_dag3", trigger_dag_id="DAG3")
op_first >> [op_trig2, op_trig3]
Edit:
After discussing in comments and since you mentioned you can not edit DAG1 as it's someone else code your best option is ExternalTaskSensor
. You will have to set DAG2 & DAG3 to start on the same schedule as DAG1 and they will need to constantly poke DAG1 till it's finish. It will work just not very optimal.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |