Category "airflow"

How to extract the query result from a Hive job output logs using DataprocHiveOperator?

I am trying to build a data migration pipeline using Airflow, source being a Hive table on a Dataproc cluster and the destination is BigQuery. I'm using Datapro

Airflow + sqlalchemy short-lived connections to metadata db

I deployed the latest airflow on a centos 7.5 vm and updated sql_alchemy_conn and result_backend to postgres databases on a postgresql instance and designated m

sqlite3 raised an error after running Airflow command line

When I ran command: airflow list_users It raised an error as below: sqlite3.OperationalError: no such table: ab_permission_view_role ... sqlalchemy.exc.Opera

How do I create a chain for data with parent child relationship using python?

If I have this set of input to convert, Input: Task A -> Task B Task A -> Task C Task B -> Task D Task C -> Task E Making use of pandas python: df

Read and group json files by date element using pyspark

I have multiple JSON files (10 TB ~) on a S3 bucket, and I need to organize these files by a date element present in every json document. What I think that my c

Amazon Managed Airflow (MWAA) import custom plugins

I'm setting up an AWS MWAA instance and I have a problem with import custom plugins. My local project structure looks like this: airflow-project ├─&

Airflow - call a operator inside a function

I'm trying to call a python operator which is inside a function using another python operator. Seems something I missed, can someone help me to find out what I

DAG run as per timezone

I want to run my dag as per new york time zone. As the data comes as per the new york time zone and dag fails for the initial runs and skips last runs data as w

Airflow metrics with prometheus and grafana

any one knows how to send metrics from airflow to prometheus, I'm not finding much documents about it, I tried the airflow operator metrics on Grafana but it d

Maximum memory size for an XCOM in Airflow

I was wondering if there is any memory size limit for an XCOM in airflow ?

Airflow: get xcom from previous dag run

I am writing a sensor which scan s3 files for fix period of time and add the list of new files arrived at that period to xcom for next task. For that, I am tryi

Airflow Installation issue| setproctitle

I Have followed the below steps for airflow installation and have successfully installed but am unable to run any commands which start with airflow, even to che

MWAA Airflow 2.0 in AWS Snowflake connection not showing

Snowflake is not showing in the connections dropdown. I am using MWAA 2.0 and the providers are already in the requirements.txt MWAA uses python 3.7 dont know i

Is there any way to can install airflow on kubernetes on premise

I am new to airflow and need assistance on how to install airflow on k8s . Needs are: 1 . How to Build docker image of airflow only for webserver and scheduler

MWAA - Tracking/monitoring progress of the "Updating" phase triggered by changing the version of requirements.txt used by an environment

I am working with Amazon Managed Workflows for Apache Airflow (MWAA). When I copy a new requirements.txt file to my S3 bucket, then use the AWS Console to speci

GoogleCloudStorageToBigQueryOperator source_objects to receive list via XCom

I would like to pass a list of strings, containing the name of files in google storage to XCom. Later to be picked up by a GoogleCloudStorageToBigQueryOperator

Airflow Tasks running in parallel

I need to run few Airflow tasks in parallel concurrently and if one task got completed successfully, need to call the other task. How can I do that? Ex: Task A

Airflow- Pass Parameters at runtime

I have a DAG. How I can I pass parameters to the DAG at runtime and start the DAG? Basically, the DAG can take upto 10 values for a param (say, number). Based o

How do I stop Apache Airflow running a task the first time when I unpause it?

I have a DAG. Here is a sample of the parameters. dag = DAG( 'My Dag', default_args=default_args, description='Cron Job : My Dag', schedule_inte

Airflow pool - less priority task is triggering first

I've been using the airflow pool to control my concurrent tasks. so I've created a test_pool with 10 slots and have created 4 tasks, out of which I have assigne