Category "databricks"

Hi All, facing an issue of spark sql query for delete on basis of timestamp

I am running the delete query with the < (less then) and > (greater then) condition on the timestamp field but we are not getting the desired results. Fir

Read outlook emails in databricks

I would like to read mails from microsoft outlook using python and run the script using a databricks cluster. I'm using win32com on my local machine and able to

AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environme

A simple distributed training python program for deep learning models by Horovod on GPU cluster

I am trying to run some example python3 code https://docs.databricks.com/applications/deep-learning/distributed-training/horovod-runner.html on databricks GPU c

Is it possible to set only one branch at Databricks shared git folder(highlighted in screenshot)?

I would like to set only one branch at shared folder in databricks workspace. Attaching screenshot to give more clarity on the same. All of data factory pipeli

VACUUM/OPTIMIZE Effect on Autoloader Checkpoints

I'm using Databricks Autoloader to incrementally stream from a Delta Lake table into a SQL database. If an OPTIMIZE or VACUUM statement is ran against the Delt

Databricks - spark-submit Error | org.springframework.core.ResolvableType.forInstance(Ljava/lang/Object;)Lorg/springframework/core/ResolvableType

Spark-submit in Databricks cluster.. is giving this error. I am using Spark 3.1.2 Scala 2.12 Springframeworkboot 2.6.3 However spark-submit is running good in m

How to Run a DataBricks Notebook From Another Notebook with "different cluster"

In Databricks I understand that a notebook can be executed from another notebook but the notebook will run in the current cluster by default. For eg: I have not

Insert Overwrite in data bricks overwriting complete data in table?

I am have two table 1 is with 50K records and other is with 2.5K records and I want to update this 2.5K records into table one. Currently I was doing this by us

How to get workspace name inside a python notebook in databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this? Ex: My workspace name is databricks-test. I want to capture this i

Spark partition size greater than the executor memory

I have four questions. Suppose in spark I have 3 worker nodes. Each worker node has 3 executors and each executor has 3 cores. Each executor has 5 gb memory. (T

Databricks DataLakeFileClient Returns Error

I have a databricks notebook running every 5 mins, part of the functionality is to connect to a file in Azure Data Lake Storage Gen2 (ADLS Gen2). I get the foll

Create a Database with name from variable on Databricks (in SQL, not in Spark)

How to create a database with a name from a variable (in SQL, not in Spark) ? I've written this : %sql SET myVar = CONCAT(getArgument('env'), 'BackOffice'); CRE

Shap value plotting error on Databricks but works locally

I want to do a simple shap analysis and plot a shap.force_plot. I noticed that it works without any issues locally in a .ipynb file, but fails on Databricks wit

Results in databricks on AWS are not displayed when run as a job

Instead of the expected output from a display(my_dataframe), I get Failed to fetch the result. Retry when looking at the completed run (also marked as success).

Has anyone found good learning resources for the "Databricks Certified Data Engineer Associate" exam?

I have been studying for the above exam using Databricks' learning platform, but I have not found any external resources such as study guides or practice exams

localhost refused to connect in a databricks notebook calling the google api

I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the d

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the sec

How to convert hashbytes string from sql to spark equivalent

I have a process using the following select statement in sql server SELECT HASHBYTES('SHA1', CAST('4100119300' AS NVARCHAR(100))) AS StringConverted This give

Pyspark select multiple columns from list and filter on different values

I have a table with ~5k columns and ~1 M rows that looks like this: ID Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 ID1 0 1 0 1 0 2 1 1 2 2 0 ID2 1