'How do I create a chain for data with parent child relationship using python?
If I have this set of input to convert, Input:
Task A -> Task B
Task A -> Task C
Task B -> Task D
Task C -> Task E
Making use of pandas python:
df = pd.DataFrame({"parent": ['Task A', 'Task A', 'Task B, 'Task C'], "child":["Task B", "Task C", 'Task D', 'Task E']})
as my input.
Output:
Task A >> (Task B, Task C) >> (Task D, Task E)
Function will return above result.
I will hope to achieve this output as I am using the output to provide airflow to configure the relationship of my tasks.
Solution 1:[1]
I don't understand your Pandas example, but in Airflow you can create 1-to-1 and 1-to-many dependencies between tasks in Airflow, but you cannot create many-to-many dependencies in Airflow using the bitshift operators (>>
and <<
).
Those can be set using a for loop:
tasks_a = [t1, t2, t3]
tasks_b = [t4, t5, t6]
for task in tasks_a:
task >> tasks_b
Or using Airflow's cross_downstream()
function:
from airflow.models.baseoperator import cross_downstream
tasks_a = [t1, t2, t3]
tasks_b = [t4, t5, t6]
cross_downstream(from_tasks=tasks_a, to_tasks=tasks_b)
Which will create dependencies from all tasks in tasks_a
to all tasks in tasks_b
:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Bas Harenslak |