Category "azure-databricks"

Load Data Using Azure Batch Service and Spark Databricks

I have File Azure Blob Storage that I need to load daily into the Data Lake. I am not clear on which approach I should use(Azure Batch Account, Custom Activity

Read outlook emails in databricks

I would like to read mails from microsoft outlook using python and run the script using a databricks cluster. I'm using win32com on my local machine and able to

How do I Insert Overwrite with parquet format?

I am have two parquet file in azure data lake gen2 I want to Insert Overwrite onw with other. I was trying the same in azure data bricks by doing below. Reading

Pyspark: join and union in for loop

I have a really simple logic that I would like to understand how I can make it work in pyspark. for data in df1: spark_data_row = spark.createDataFrame(data

Azure Databricks - Generate SQL Select Statement with Columns

I have tables in Azure Databricks that I am using SQL to interact with via a notebook. I need to select all columns from a table with 200 columns, I need to sel

Is it possible to set only one branch at Databricks shared git folder(highlighted in screenshot)?

I would like to set only one branch at shared folder in databricks workspace. Attaching screenshot to give more clarity on the same. All of data factory pipeli

VACUUM/OPTIMIZE Effect on Autoloader Checkpoints

I'm using Databricks Autoloader to incrementally stream from a Delta Lake table into a SQL database. If an OPTIMIZE or VACUUM statement is ran against the Delt

IoT - Databricks Deltalake - access in C# api or Node js API

I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and sto

How to handle memory issue while writing data in which a particular column contains very large data in each record in databricks in pyspark

I have a set of records with 10 columns. There is a column 'x' which contains an array of float values and the length of array can be very large(for eg, the len

pycaret.time_series.TSForecastingExperiment ImportError: cannot import name '_check_param_grid' from 'sklearn.model_selection._search'

I am getting the below error, while importing Pycaret time-series(beta) module in the databricks (we were running successfully earlier). Request your help in so

How to get workspace name inside a python notebook in databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this? Ex: My workspace name is databricks-test. I want to capture this i

Create a Database with name from variable on Databricks (in SQL, not in Spark)

How to create a database with a name from a variable (in SQL, not in Spark) ? I've written this : %sql SET myVar = CONCAT(getArgument('env'), 'BackOffice'); CRE

Shap value plotting error on Databricks but works locally

I want to do a simple shap analysis and plot a shap.force_plot. I noticed that it works without any issues locally in a .ipynb file, but fails on Databricks wit

IllegalArgumentException: File must be dbfs or s3n: /

dbutils.fs.mount( source = f"wasbs://{blob.storage_account_container}@{blob.storage_account_name}.blob.core.windows.net/", mount_point = "/mnt/MLRExtract/"

Recursive View Message - Azure Data Bricks

Error : AnalysisException: Recursive view management_db.v_extract detected (cycle: management_db.v_extract -> management_db.v_extract) Query outisde of the v

Executing Stored Procedure in Databricks when using Azure Apache Spark connector

Following example from Azure team is using Apache Spark connector for SQL Server to write data to a table. Question: How can we execute a Stored Procedure in an

localhost refused to connect in a databricks notebook calling the google api

I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the d

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the sec

SQL order of execution

I wonder how this query is executing successfully. As we know 'having' clause execute before the select one then here how alias name used in 'select' statement

What is the best way to cleanup and recreate databricks delta table?

I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu