Category "databricks"

Can I iterate through the widgets in a databricks notebook?

Can I iterate through the widgets in a databricks notebook? Something like this pseudocode? # NB - not valid inputs = {widget.name: widget.value for widget in

Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

We are working to migrate to data bricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime

Is it possible to connect to serverless sql pool via azure databricks?

I'm trying to connect to synapse serverless pool via databricks. I need to create synapse views and external tables directly in databricks as part of an existin

ParseException: SQL CTE

result = aml_identity_g.connectedComponents() conn_comps = result.select("id", "component",'type') \ .createOrReplaceTempView("components") display(result)

Sort by key (Month) using RDDs in Pyspark

I have this RDD and wanna sort it by Month (Jan --> Dec). How can i do it in pyspark? Note: Don't want to use spark.sql or Dataframe. +-----+-----+ |Month|co

How to get list of all leaf folders from ADLS Gen2 path via Scala code?

We have folders and subfolders in it with year,month, day folders in it. How can we get only the last leaf level folder list using dbutils.fs.ls utility? Exampl

Is it possible to persist .env values in the .whl files when installed on a Databricks cluster? I'd prefer to keep all values in library (.whl)

I have created a project in Pycharm. This project has a .py file with functions, init.py and a .env file with my secret values. I need to be able to run this in

loading a tab delimited text file as a hive table/dataframe in databricks

I am trying to upload a tab delimited text file in databricks notebooks, but all the column values are getting pushed into one column value here is the sql code

Write data from broadcast variable (databricks) to azure blob

I have a url from where I download the data (which is in JSON format) using Databricks: url="https://tortuga-prod-eu.s3-eu-west-1.amazonaws.com/%2FNinetyDays/am

How "stable" is monotonically_increasing_id() in Spark?

I'm looking for an inexpensive way to distinguish duplicates and/or uniquely identify rows. I've been looking at the Spark built-ins monotonically_increasing_id

DataBricks Python unit test error help needed

I have a problem with unit testing in DataBricks. I have not found any similar error message yet. Could someone please help me? Below you can find my code: impo

Databricks dbfs file read issue

I am trying to open a file that i uploaded to the dbfs location. However, I get error while trying to open the file but I can see the file when I do a ls. Also

Alter multiple column comments simultaneously in spark/delta lake

Short version: Need a faster/better way to update many column comments at once in spark/databricks. I have a pyspark notebook that can do this sequentially acro

Databricks delta live tables stuck when ingest file from S3

I'm new to databricks and just created a delta live tables to ingest 60 millions json file from S3. However the input rate (the number of files that it read fro

Flatten Nested Json String Column Table into tabular format

I am currently trying to get a flatten a data in databricks table. Since some of the columns are deeply nested and is of 'String' type, i couldn't use explode f

Databricks- ConcurrentAppendException:

I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo

Spark on Rapids single node

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting res

How do I select the columns of a table in databricks sql?

I can use: show columns in table_name but this does not allow me to use the output in a query? This throws an error: SELECT * FROM show columns in table_name

Unable to fetch secrets using Instance Profile from databricks for a spring boot application

I am using spring-cloud-starter-aws-secrets-manager-config 2.3.3 for a spring boot application which works perfectly in my local pointing to stage environment

pyspark - getting error 'list' object has no attribute 'write' when attempting to write to a delta table

I am attempting to read the first X number of rows of a delta table into a dataframe, and then write (overwrite) that back to the delta table. Here is code: # r

Category "databricks"

Other Categories