'How do I call scrapy from airflow dag?
My scrapy project runs perfectly well with 'scrapy crawl spider_1' command. How to trigger it (or call the scrappy command) from airflow dag?
with DAG(<args>) as dag:
scrapy_task = PythonOperator(
task_id='scrapy',
python_callable= ?)
task_2 = ()
task_3 = ()
....
scrapy_task >> [task_2, task_3, ...]
Solution 1:[1]
Run with BashOperator
with DAG(<args>) as dag:
scrapy_task = BashOperator(
task_id='scrapy',
bash_command='scrapy crawl spider_1')
- If you're using virtualenv, you may use VirtualEnvOperator
- or to use existing environment, you can use
source activate venv && scrapy crawl spider_1
- or to use existing environment, you can use
Run with PythonOperator
- From scrapy documentation: https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl('spider_1')
process.start() # the script will block here until the crawling is finished
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | NoThlnG |