Category "azure-databricks"

Databricks repos - unable to use dbutils.notebook.run with absolute path

I'm unable to get an absolute path working with dbutils.notebook.run(). Using the absolute path with dbutils.fs.ls(with "file:/Workspace/Repos/user_email/Datala

Update The Data in the Exisiting table based on different levels in Spark sql

I have this Existing table tb1 in my database Now new data comes and new data is stored in another table tb2 Earlier Account_Number 9988 was Level 2, But now

MLFlow Webhook calling Azure DevOps pipeline - retrieve body

I am using the MLFlow Webhooks , mentioned here. I am using that to queue an Azure Devops Pipeline. However, I can't seem to to find a way to retrieve the paylo

How to processing json data in a column by using python/pyspark?

Trying to process JSON data in a column on Databricks. Below is the sample data from a table (its a weather device records info) JSON_Info {"sampleData":"dataD

Is there an elegant, easy and fast way to move data out of HBase into MongoDB?

Is there an elegant, easy and fast way to move data out of HBase into MongoDB? I want to migrate HBase to mongoDB. I am new to mongoDB. Could someone please hel

Azure Databricks - Write to parquet file using spark.sql with union and subqueries

Issue: I'm trying to write to parquet file using spark.sql, however I encounter issues when having unions or subqueries. I know there's some syntax I can't seem

Databricks REST API call for updating branch error : User Settings > Git Integration to set up an Azure DevOps personal access token

I am getting below error for updating the repo to a different branch using databricks rest api as mentioned at https://docs.databricks.com/dev-tools/api/latest/

Can I iterate through the widgets in a databricks notebook?

Can I iterate through the widgets in a databricks notebook? Something like this pseudocode? # NB - not valid inputs = {widget.name: widget.value for widget in

Add comments to delta

If a pyspark dataframe is reading some data from a table and writing it to azure delta lake Can we add comments to this newly written file? For e.g Df = sql("se

Reading Databricks tables in Azure

Please clarify my confusion as I keep hearing we need read every Parquet file created by Databricks Delta tables to get to latest data in case of a SCD2 table.

How to get list of all leaf folders from ADLS Gen2 path via Scala code?

We have folders and subfolders in it with year,month, day folders in it. How can we get only the last leaf level folder list using dbutils.fs.ls utility? Exampl

In a pyspark dataframe, when I rename a column, the previous name can still be used for filtering. Bug or feature?

I work on DataBricks with PySpark dataframe containing string-type columns. I use .withColumnRenamed() to rename one of them. Later in the process I use a .filt

How to flatten a nested Json struct using Python databricks

Trying to flatten a nested json response using Python databricks dataframe. I was able to flatten the "survey" struct successfully but getting errors when i try

SQL Azure Data Bricks

We have a table 1 Day table aggregated with group by call_date ,tdlinx_id ,work_request_id ,category_name another table we have 1 week level data aggregated w

Databricks Error: AnalysisException: Incompatible format detected. with Delta

I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option

Split corresponding column values in pyspark

Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c

Azure Databricks Delta Table modifies the TIMESTAMP format while writing from Spark DataFrame

I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa

How can I execute and schedule Databricks notebook from Azure Devops Pipeline using YAML

I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow Pushed my code from Databricks notebook to Azure Repos. Crea

Databricks- ConcurrentAppendException:

I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo

Azure ADLS Gen2 file created by Azure Databricks doesn't inherit ACL

I have a databricks notebook that is writing a dataframe to a file in ADLS Gen2 storage. It creates a temp folder, outputs the file and then copies that file to