Category "azure-databricks"

MLFlow Webhook calling Azure DevOps pipeline - retrieve body

I am using the MLFlow Webhooks , mentioned here. I am using that to queue an Azure Devops Pipeline. However, I can't seem to to find a way to retrieve the paylo

How to processing json data in a column by using python/pyspark?

Trying to process JSON data in a column on Databricks. Below is the sample data from a table (its a weather device records info) JSON_Info {"sampleData":"dataD

Is there an elegant, easy and fast way to move data out of HBase into MongoDB?

Is there an elegant, easy and fast way to move data out of HBase into MongoDB? I want to migrate HBase to mongoDB. I am new to mongoDB. Could someone please hel

Azure Databricks - Write to parquet file using spark.sql with union and subqueries

Issue: I'm trying to write to parquet file using spark.sql, however I encounter issues when having unions or subqueries. I know there's some syntax I can't seem

Databricks REST API call for updating branch error : User Settings > Git Integration to set up an Azure DevOps personal access token

I am getting below error for updating the repo to a different branch using databricks rest api as mentioned at https://docs.databricks.com/dev-tools/api/latest/

Can I iterate through the widgets in a databricks notebook?

Can I iterate through the widgets in a databricks notebook? Something like this pseudocode? # NB - not valid inputs = {widget.name: widget.value for widget in

Add comments to delta

If a pyspark dataframe is reading some data from a table and writing it to azure delta lake Can we add comments to this newly written file? For e.g Df = sql("se

Reading Databricks tables in Azure

Please clarify my confusion as I keep hearing we need read every Parquet file created by Databricks Delta tables to get to latest data in case of a SCD2 table.

How to get list of all leaf folders from ADLS Gen2 path via Scala code?

We have folders and subfolders in it with year,month, day folders in it. How can we get only the last leaf level folder list using dbutils.fs.ls utility? Exampl

In a pyspark dataframe, when I rename a column, the previous name can still be used for filtering. Bug or feature?

I work on DataBricks with PySpark dataframe containing string-type columns. I use .withColumnRenamed() to rename one of them. Later in the process I use a .filt

How to flatten a nested Json struct using Python databricks

Trying to flatten a nested json response using Python databricks dataframe. I was able to flatten the "survey" struct successfully but getting errors when i try

SQL Azure Data Bricks

We have a table 1 Day table aggregated with group by call_date ,tdlinx_id ,work_request_id ,category_name another table we have 1 week level data aggregated w

Databricks Error: AnalysisException: Incompatible format detected. with Delta

I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option

Split corresponding column values in pyspark

Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c

Azure Databricks Delta Table modifies the TIMESTAMP format while writing from Spark DataFrame

I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa

How can I execute and schedule Databricks notebook from Azure Devops Pipeline using YAML

I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow Pushed my code from Databricks notebook to Azure Repos. Crea

Databricks- ConcurrentAppendException:

I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo

Azure ADLS Gen2 file created by Azure Databricks doesn't inherit ACL

I have a databricks notebook that is writing a dataframe to a file in ADLS Gen2 storage. It creates a temp folder, outputs the file and then copies that file to

How to loop through folders in Azure Blob Containers

I have the following code which is written in Visual Studio Code. Now I want to run this in Azure Databricks. I have uploaded the entire folder to my Azure Blob

Spark binary file and Delta Table

I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa