Category "databricks"

Azure Data Explorer (ADX) vs Polybase vs Databricks

Question Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all exc

Airflow DAGS Orchestration

I have three DAGs (say, DAG1, DAG2 and DAG3). I have a monthly scheduler for DAG1. DAG2 and DAG3 must not be run directly (no scheduler for these) and must be r

compare two tables having same column name but different date column names

I have table A id1 dt x1 2022-04-10 a2 2022-04-10 a1 2022-04-10 x1 2022-05-10 x2 2022-04-10 y2 2022-04-10 y1 2022-05-10 x1 2022-06 -10 Table B id1 dt a1 2022

Py4JJavaError in an Azure Databricks notebook pipeline

I have a curious issue, when launching a databricks notebook from a caller notebook through dbutils.notebook.run (I am working in Azure Databricks). One intere

Get the list of loaded files from Databricks Autoloader

We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader databa

Spark SQL - org.apache.spark.sql.AnalysisException

The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select

Using Databricks/Python3.x ZipFile to extract 7gb file from zip

I've got a large NPI zipfile which includes a 7.3gb csv. (file can be located on NPI site here: http://download.cms.gov/nppes/NPI_Files.html -- the Full Replac

How to avoid zipfile error with python-pptx saving files

I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time

Issues installing gdal-bin (libmysqlclient21 dependency) on 20.04.3 (databricks job clusters)

I've had, in the past, gdal utilities installed successfully on a Databricks Cluster running 20.04.3 LTS (focal). $ cat /etc/os-release NAME="Ubuntu" VERSION="2

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing t

How Execute Azure data bricks notebook from excel

Is there any way to trigger Azure data bricks notebook from excel, if is there please help me how..? Many thanks

I am trying to connect to databricks through cli, wated to replicate same in Azure devops

In the local system i am writing commands: pip install databricks-cli databricks configure--token token value and later token Now the thing is In azure devops i

Databricks Spark: java.lang.OutOfMemoryError: GC overhead limit exceeded i

I am executing a Spark job in Databricks cluster. I am triggering the job via a Azure Data Factory pipeline and it execute at 15 minute interval so after the su

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet data/2022/05/01/00/data_00.parquet data/2022/05/01/01/data_01.parquet data/2022/05/01/02/da

How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql.

Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) functi

Spark Delta table restore to version

I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa

Databricks Cluster terminated. Reason: Cloud Provider Launch Failure

I'm using Azure Databricks with a custom configuration that uses vnet injection and I am unable to start a cluster in my workspace. The error message being give

Is there any way to unnest bigquery columns in databricks in single pyspark script

I am trying to connect bigquery using databricks latest version(7.1+, spark 3.0) with pyspark as script editor/base language. We ran a below pyspark script to f