'Airflow dags and PYTHONPATH
I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations.
Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko'
Inside of a file I can directly modify the python sys.path and that seems to mitigate my issue.
import sys
sys.path.append('/home/airflow/.local/lib/python2.7/site-packages')
That doesn't feel right though having to set my path in my code directly. I've tried exporting PYTHONPATH in the Airflow user accounts .bashrc but doesn't seem to be read when the dag jobs are executed. What's the correct way to go about this?
Thanks.
----- update -----
Thanks for the responses.
below is my systemctl scripts.
::::::::::::::
airflow-scheduler-airflow2.service
::::::::::::::
[Unit]
Description=Airflow scheduler daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
::::::::::::::
airflow-webserver-airflow2.service
::::::::::::::
[Unit]
Description=Airflow webserver daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow webserver
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
this is the EnvironentFile Contents uses from above
more /usr/local/airflow/instances/airflow2/etc/envars
PATH=/usr/local/airflow/instances/airflow2/venv/bin:/usr/local/bin:/usr/bin:/bin
AIRFLOW_HOME=/usr/local/airflow/instances/airflow2/home
AIRFLOW_CONFIG=/usr/local/airflow/instances/airflow2/etc/airflow.cfg
Solution 1:[1]
I had similar issue:
- Python wasn't loaded from virtualenv for running airflow (this fixed airflow deps not being fetched from virtualenv)
- Submodules under dags path wasn't loaded due different base path (this fixed importing own modules under
dags
folder
I added following strings to the environemnt file for systemd service
(/usr/local/airflow/instances/airflow2/etc/envars
in your case)
source /home/ubuntu/venv/airflow/bin/activate
PYTHONPATH=/home/ubuntu/venv/airflow/dags:$PYTHONPATH
Solution 2:[2]
It looks like your python environment is degraded - you have multiple instances of python on your vm (python 3.6 and python 2.7) and multiple instances of pip. There is a pip with python3.6 that is trying to be used, but all of your modules are actually with your python 2.7.
This can be solved easily by using symbolic links to redirect to 2.7.
Type the commands and see which instance of python is used (2.7.5, 2.7.14, 3.6, etc):
python
python2
python2.7
or type which python
to find which python instance is being used by your vm. You can also do which pip
to see what pip instance is being used.
I am going to assume python
and which python
leads to python 3 (which you do not want to use), but python2
and python2.7
lead to the instance you do want to use.
To create a symbolic link so that /home/airflow/.local/lib/python2.7/
is used, do the following and create the following symbolic links:
cd home/airflow/.local/lib/python2.7
ln -s python2 python
ln -s /home/airflow/.local/lib/python2.7 python2
Symbolic link structure is: ln -s #PATHDIRECTED #LINKNAME
You are essentially saying when you run the command python
, go to python2
. When python2
is then ran, go to /home/airflow/.local/lib/python2.7
. Its all being redirected.
Now re run the three commands above (python, python2, python2.7). All should lead to the python instance you want.
Hope this helps!
Solution 3:[3]
You can add this directly to the Airflow Dockerfile, such as the example below. If you have a .env
file you can do ENV PYTHONPATH "${PYTHONPATH}:${AIRFLOW_HOME}"
.
FROM puckel/docker-airflow:1.10.6
RUN pip install --user psycopg2-binary
ENV AIRFLOW_HOME=/usr/local/airflow
# add persistent python path (for local imports)
ENV PYTHONPATH "/home/jovyan/work:${AIRFLOW_HOME}"
COPY ./airflow.cfg /usr/local/airflow/airflow.cfg
CMD ["airflow initdb"]
Solution 4:[4]
I still have the same problem when I try to trigger a dag from UI (cant locate python local modules i.e my_module.my_sub_module... etc), but when I test with :
airflow test my_dag my_task 2021-04-01
It works fine !
I also have in my .bashrc the line (where it supposed to find python local modules):
export PYTHONPATH="/home/my_user"
Solution 5:[5]
Sorry guys this topics is very old but i have a lot of problem for launch airflow as daemon, i share my solution
first i installed anaconda in /home/myuser/anaconda3 and i installed all library that i using in my dags, next create follow files:
/etc/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow webserver daemon
After=network.target
[Service]
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
User=myuser
Group=myuser
Type=simple
ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow webserver -p 8080 --pid /home/myuser/airflow/webserver.pid'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
same for daemon scheduler
/etc/systemd/system/airflow-schedule.service
[Unit]
Description=Airflow schedule daemon
After=network.target
[Service]
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
User=myuser
Group=myuser
Type=simple
ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow scheduler'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
next exec command of systemclt:
sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver.service
sudo systemctl enable airflow-schedule.service
sudo systemctl start airflow-webserver.service
sudo systemctl start airflow-schedule.service
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Andrey |
Solution 2 | Zack |
Solution 3 | gsilv |
Solution 4 | Mehdi Hadji |
Solution 5 |