'Airflow dags and PYTHONPATH

I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations.

Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko'

Inside of a file I can directly modify the python sys.path and that seems to mitigate my issue. import sys sys.path.append('/home/airflow/.local/lib/python2.7/site-packages')

That doesn't feel right though having to set my path in my code directly. I've tried exporting PYTHONPATH in the Airflow user accounts .bashrc but doesn't seem to be read when the dag jobs are executed. What's the correct way to go about this?

Thanks.

----- update -----

Thanks for the responses.

below is my systemctl scripts.

::::::::::::::
airflow-scheduler-airflow2.service
::::::::::::::
[Unit]
Description=Airflow scheduler daemon

[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
::::::::::::::
airflow-webserver-airflow2.service
::::::::::::::
[Unit]
Description=Airflow webserver daemon

[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow webserver
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

this is the EnvironentFile Contents uses from above

more /usr/local/airflow/instances/airflow2/etc/envars
PATH=/usr/local/airflow/instances/airflow2/venv/bin:/usr/local/bin:/usr/bin:/bin
AIRFLOW_HOME=/usr/local/airflow/instances/airflow2/home
AIRFLOW_CONFIG=/usr/local/airflow/instances/airflow2/etc/airflow.cfg


Solution 1:[1]

I had similar issue:

  1. Python wasn't loaded from virtualenv for running airflow (this fixed airflow deps not being fetched from virtualenv)
  2. Submodules under dags path wasn't loaded due different base path (this fixed importing own modules under dags folder

I added following strings to the environemnt file for systemd service (/usr/local/airflow/instances/airflow2/etc/envars in your case)

source /home/ubuntu/venv/airflow/bin/activate
PYTHONPATH=/home/ubuntu/venv/airflow/dags:$PYTHONPATH

Solution 2:[2]

It looks like your python environment is degraded - you have multiple instances of python on your vm (python 3.6 and python 2.7) and multiple instances of pip. There is a pip with python3.6 that is trying to be used, but all of your modules are actually with your python 2.7.

This can be solved easily by using symbolic links to redirect to 2.7.

Type the commands and see which instance of python is used (2.7.5, 2.7.14, 3.6, etc):

  1. python
  2. python2
  3. python2.7

or type which python to find which python instance is being used by your vm. You can also do which pip to see what pip instance is being used.

I am going to assume python and which python leads to python 3 (which you do not want to use), but python2 and python2.7 lead to the instance you do want to use.

To create a symbolic link so that /home/airflow/.local/lib/python2.7/ is used, do the following and create the following symbolic links:

  1. cd home/airflow/.local/lib/python2.7
  2. ln -s python2 python
  3. ln -s /home/airflow/.local/lib/python2.7 python2

Symbolic link structure is: ln -s #PATHDIRECTED #LINKNAME You are essentially saying when you run the command python, go to python2. When python2 is then ran, go to /home/airflow/.local/lib/python2.7. Its all being redirected.

Now re run the three commands above (python, python2, python2.7). All should lead to the python instance you want.

Hope this helps!

Solution 3:[3]

You can add this directly to the Airflow Dockerfile, such as the example below. If you have a .env file you can do ENV PYTHONPATH "${PYTHONPATH}:${AIRFLOW_HOME}".

FROM puckel/docker-airflow:1.10.6
RUN pip install --user psycopg2-binary
ENV AIRFLOW_HOME=/usr/local/airflow

# add persistent python path (for local imports)
ENV PYTHONPATH "/home/jovyan/work:${AIRFLOW_HOME}"

COPY ./airflow.cfg /usr/local/airflow/airflow.cfg
CMD ["airflow initdb"]

Solution 4:[4]

I still have the same problem when I try to trigger a dag from UI (cant locate python local modules i.e my_module.my_sub_module... etc), but when I test with :

airflow test my_dag my_task  2021-04-01

It works fine !

I also have in my .bashrc the line (where it supposed to find python local modules):

export PYTHONPATH="/home/my_user"

Solution 5:[5]

Sorry guys this topics is very old but i have a lot of problem for launch airflow as daemon, i share my solution

first i installed anaconda in /home/myuser/anaconda3 and i installed all library that i using in my dags, next create follow files:

/etc/systemd/system/airflow-webserver.service
    [Unit]
    Description=Airflow webserver daemon
    After=network.target
    
    [Service]
    Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    RuntimeDirectory=airflow
    RuntimeDirectoryMode=0775
    User=myuser
    Group=myuser
    Type=simple
    ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow webserver -p 8080 --pid /home/myuser/airflow/webserver.pid'
    Restart=on-failure
    RestartSec=5s
    PrivateTmp=true
    
    [Install]
    WantedBy=multi-user.target

same for daemon scheduler

/etc/systemd/system/airflow-schedule.service

[Unit]
Description=Airflow schedule daemon
After=network.target
[Service]
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
User=myuser
Group=myuser
Type=simple
ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow scheduler'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target

next exec command of systemclt:

sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver.service
sudo systemctl enable airflow-schedule.service

sudo systemctl start airflow-webserver.service
sudo systemctl start airflow-schedule.service

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrey
Solution 2 Zack
Solution 3 gsilv
Solution 4 Mehdi Hadji
Solution 5