'Airflow Scheduler fails to execute Windows EXE via WSL

My Windows 10 machine has Airflow 1.10.11 installed within WSL 2 (Ubuntu-20.04).

I have a BashOperator task which calls an .EXE on Windows (via /mnt/c/... or via symlink). The task fails. Log shows:

[2020-12-16 18:34:11,833] {bash_operator.py:134} INFO - Temporary script location: /tmp/airflowtmp2gz6d79p/download.legacyFilesnihvszli
[2020-12-16 18:34:11,833] {bash_operator.py:146} INFO - Running command: /mnt/c/Windows/py.exe
[2020-12-16 18:34:11,836] {bash_operator.py:153} INFO - Output:
[2020-12-16 18:34:11,840] {bash_operator.py:159} INFO - Command exited with return code 1
[2020-12-16 18:34:11,843] {taskinstance.py:1150} ERROR - Bash command failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.8/dist-packages/airflow/operators/bash_operator.py", line 165, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
[2020-12-16 18:34:11,844] {taskinstance.py:1187} INFO - Marking task as FAILED. dag_id=test-dag, task_id=download.files, execution_date=20201216T043701, start_date=20201216T073411, end_date=20201216T073411

And that's it. Return code 1 with no further useful info.

Running the very same EXE via bash works perfectly, with no error (I also tried it on my own program which emits something to the console - in bash it emits just fine, but via airflow scheduler it's the same error 1).

Some more data and things I've done to rule out any other issue:

  • airflow scheduler runs as root. I also confirmed it's running in a root context by putting an whoami command in my BashOperator, which indeed emitted root (I should also note that all native Linux programs run just fine! only the Windows programs don't.)
  • The Windows EXE I'm trying to execute and its directory have full 'Everyone' permissions (on my own program of course, wouldn't dare doing it on my Windows folder - that was just an example.)
  • The failure happens both when accessing via /mnt/c as well as via symlink. In the case of a symlink, the symlink has 777 permissions.
  • I tried running airflow test on a BashOperator task - it runs perfectly - emits output to the console and returns 0 (success).
  • Tried with various EXE files - both "native" (e.g. ones that come with Windows) as well as my C#-made programs. Same behavior in all.
  • Didn't find any similar issue documented in Airflow's GitHub repo nor here in Stack Overflow.

The question is: How does Airflow's Python usage of a subprocess (which airflow scheduler uses to run Bash Operators) different than a "normal" Bash, causing an error 1?



Solution 1:[1]

you can use the library subprocess and sys of Python and PowerShell

In the folder Airflow > Dags, create 2 files: main.py and caller.py

so, main.py call caller.py and caller.py go in machine (Windows) to run the files or routines.

This is the process:

enter image description here

code Main.py:

# Importing the libraries we are going to use in this example
from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.bash_operator import BashOperator


# Defining some basic arguments
default_args = {
   'owner': 'your_name_here',
   'depends_on_past': False,
   'start_date': datetime(2019, 1, 1),
   'retries': 0,
   }


# Naming the DAG and defining when it will run (you can also use arguments in Crontab if you want the DAG to run for example every day at 8 am)
with DAG(
       'Main',
       schedule_interval=timedelta(minutes=1),
       catchup=False,
       default_args=default_args
       ) as dag:

# Defining the tasks that the DAG will perform, in this case the execution of two Python programs, calling their execution by bash commands
    t1 = BashOperator(
       task_id='caller',
       bash_command="""
       cd /home/[Your_Users_Name]/airflow/dags/
       python3 Caller.py
       """)

    # copy t1, paste, rename t1 to t2 and call file.py
    
# Defining the execution pattern
    t1

    # comment: t1 execute and call t2
    # t1 >> t2

Code Caller.py

import subprocess, sys

p = subprocess.Popen(["powershell.exe"
                     ,"cd C:\\Users\\[Your_Users_Name]\\Desktop; python file.py"] # file .py
                    #,"cd C:\\Users\\[Your_Users_Name]\\Desktop; .\file.html"]    # file .html
                    #,"cd C:\\Users\\[Your_Users_Name]\\Desktop; .\file.bat"]     # file .bat
                    #,"cd C:\\Users\\[Your_Users_Name]\\Desktop; .\file.exe"]     # file .exe
                    , stdout=sys.stdout
                     )

p.communicate()

How to know if your code will work in airflow, if run, its Ok.

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1