'How do I call scrapy from airflow dag?

My scrapy project runs perfectly well with 'scrapy crawl spider_1' command. How to trigger it (or call the scrappy command) from airflow dag?

with DAG(<args>) as dag:
     scrapy_task = PythonOperator(
          task_id='scrapy',
          python_callable= ?)
     task_2 = ()
     task_3 = ()
   ....
scrapy_task >> [task_2, task_3, ...]


Solution 1:[1]

Run with BashOperator

with DAG(<args>) as dag:
     scrapy_task = BashOperator(
          task_id='scrapy',
          bash_command='scrapy crawl spider_1')
  • If you're using virtualenv, you may use VirtualEnvOperator
    • or to use existing environment, you can use source activate venv && scrapy crawl spider_1

Run with PythonOperator

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())
process.crawl('spider_1')
process.start() # the script will block here until the crawling is finished

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NoThlnG