Category "azure-databricks"

Spark binary file and Delta Table

I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa

Load Data Using Azure Batch Service and Spark Databricks

I have File Azure Blob Storage that I need to load daily into the Data Lake. I am not clear on which approach I should use(Azure Batch Account, Custom Activity

Read outlook emails in databricks

I would like to read mails from microsoft outlook using python and run the script using a databricks cluster. I'm using win32com on my local machine and able to

How do I Insert Overwrite with parquet format?

I am have two parquet file in azure data lake gen2 I want to Insert Overwrite onw with other. I was trying the same in azure data bricks by doing below. Reading

Pyspark: join and union in for loop

I have a really simple logic that I would like to understand how I can make it work in pyspark. for data in df1: spark_data_row = spark.createDataFrame(data

Azure Databricks - Generate SQL Select Statement with Columns

I have tables in Azure Databricks that I am using SQL to interact with via a notebook. I need to select all columns from a table with 200 columns, I need to sel

Is it possible to set only one branch at Databricks shared git folder(highlighted in screenshot)?

I would like to set only one branch at shared folder in databricks workspace. Attaching screenshot to give more clarity on the same. All of data factory pipeli

VACUUM/OPTIMIZE Effect on Autoloader Checkpoints

I'm using Databricks Autoloader to incrementally stream from a Delta Lake table into a SQL database. If an OPTIMIZE or VACUUM statement is ran against the Delt

IoT - Databricks Deltalake - access in C# api or Node js API

I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and sto

How to handle memory issue while writing data in which a particular column contains very large data in each record in databricks in pyspark

I have a set of records with 10 columns. There is a column 'x' which contains an array of float values and the length of array can be very large(for eg, the len

pycaret.time_series.TSForecastingExperiment ImportError: cannot import name '_check_param_grid' from 'sklearn.model_selection._search'

I am getting the below error, while importing Pycaret time-series(beta) module in the databricks (we were running successfully earlier). Request your help in so

How to get workspace name inside a python notebook in databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this? Ex: My workspace name is databricks-test. I want to capture this i

Create a Database with name from variable on Databricks (in SQL, not in Spark)

How to create a database with a name from a variable (in SQL, not in Spark) ? I've written this : %sql SET myVar = CONCAT(getArgument('env'), 'BackOffice'); CRE

Shap value plotting error on Databricks but works locally

I want to do a simple shap analysis and plot a shap.force_plot. I noticed that it works without any issues locally in a .ipynb file, but fails on Databricks wit

IllegalArgumentException: File must be dbfs or s3n: /

dbutils.fs.mount( source = f"wasbs://{blob.storage_account_container}@{blob.storage_account_name}.blob.core.windows.net/", mount_point = "/mnt/MLRExtract/"

Recursive View Message - Azure Data Bricks

Error : AnalysisException: Recursive view management_db.v_extract detected (cycle: management_db.v_extract -> management_db.v_extract) Query outisde of the v

Executing Stored Procedure in Databricks when using Azure Apache Spark connector

Following example from Azure team is using Apache Spark connector for SQL Server to write data to a table. Question: How can we execute a Stored Procedure in an

localhost refused to connect in a databricks notebook calling the google api

I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the d

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the sec

SQL order of execution

I wonder how this query is executing successfully. As we know 'having' clause execute before the select one then here how alias name used in 'select' statement