Category "databricks"

Spark on Rapids single node

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting res

How do I select the columns of a table in databricks sql?

I can use: show columns in table_name but this does not allow me to use the output in a query? This throws an error: SELECT * FROM show columns in table_name

Unable to fetch secrets using Instance Profile from databricks for a spring boot application

I am using spring-cloud-starter-aws-secrets-manager-config 2.3.3 for a spring boot application which works perfectly in my local pointing to stage environment

pyspark - getting error 'list' object has no attribute 'write' when attempting to write to a delta table

I am attempting to read the first X number of rows of a delta table into a dataframe, and then write (overwrite) that back to the delta table. Here is code: # r

Hi All, facing an issue of spark sql query for delete on basis of timestamp

I am running the delete query with the < (less then) and > (greater then) condition on the timestamp field but we are not getting the desired results. Fir

Read outlook emails in databricks

I would like to read mails from microsoft outlook using python and run the script using a databricks cluster. I'm using win32com on my local machine and able to

AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environme

A simple distributed training python program for deep learning models by Horovod on GPU cluster

I am trying to run some example python3 code https://docs.databricks.com/applications/deep-learning/distributed-training/horovod-runner.html on databricks GPU c

Is it possible to set only one branch at Databricks shared git folder(highlighted in screenshot)?

I would like to set only one branch at shared folder in databricks workspace. Attaching screenshot to give more clarity on the same. All of data factory pipeli

VACUUM/OPTIMIZE Effect on Autoloader Checkpoints

I'm using Databricks Autoloader to incrementally stream from a Delta Lake table into a SQL database. If an OPTIMIZE or VACUUM statement is ran against the Delt

Databricks - spark-submit Error | org.springframework.core.ResolvableType.forInstance(Ljava/lang/Object;)Lorg/springframework/core/ResolvableType

Spark-submit in Databricks cluster.. is giving this error. I am using Spark 3.1.2 Scala 2.12 Springframeworkboot 2.6.3 However spark-submit is running good in m

How to Run a DataBricks Notebook From Another Notebook with "different cluster"

In Databricks I understand that a notebook can be executed from another notebook but the notebook will run in the current cluster by default. For eg: I have not

Insert Overwrite in data bricks overwriting complete data in table?

I am have two table 1 is with 50K records and other is with 2.5K records and I want to update this 2.5K records into table one. Currently I was doing this by us

How to get workspace name inside a python notebook in databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this? Ex: My workspace name is databricks-test. I want to capture this i

Spark partition size greater than the executor memory

I have four questions. Suppose in spark I have 3 worker nodes. Each worker node has 3 executors and each executor has 3 cores. Each executor has 5 gb memory. (T

Databricks DataLakeFileClient Returns Error

I have a databricks notebook running every 5 mins, part of the functionality is to connect to a file in Azure Data Lake Storage Gen2 (ADLS Gen2). I get the foll

Create a Database with name from variable on Databricks (in SQL, not in Spark)

How to create a database with a name from a variable (in SQL, not in Spark) ? I've written this : %sql SET myVar = CONCAT(getArgument('env'), 'BackOffice'); CRE

Shap value plotting error on Databricks but works locally

I want to do a simple shap analysis and plot a shap.force_plot. I noticed that it works without any issues locally in a .ipynb file, but fails on Databricks wit

Results in databricks on AWS are not displayed when run as a job

Instead of the expected output from a display(my_dataframe), I get Failed to fetch the result. Retry when looking at the completed run (also marked as success).

Has anyone found good learning resources for the "Databricks Certified Data Engineer Associate" exam?

I have been studying for the above exam using Databricks' learning platform, but I have not found any external resources such as study guides or practice exams