I have multiple JSON files (10 TB ~) on a S3 bucket, and I need to organize these files by a date element present in every json document. What I think that my c
I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu
As you can see the library I'm using is asking to make an entry but there's no box/window where I can make the entry. How do I make an entry here amongst y/n/u/
I have streaming data coming in as JSON array and I want flatten it out as a single row in a Spark dataframe using Python. Here is how the JSON data looks like
I am loading data via pipelines in ADLS gen2 container. Now I want to create a table that has details that when the pipeline start running and then completed. l
I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a
I am using Spark ML library for classification problem using a logistic regression. I have vectorized input features and created training dataset and test datas
I am running databricks 7.3LTS and having errors while trying to use scala bulk copy. The error is: object sqldb is not a member of package com.microsoft. I hav
I am working with Azure Databricks jupyter notebooks and have time-consuming jobs (complex queries, model training, loops over many items, etc.). Every time I c
I am really struggling from months. We are trying to scan SCALA code with SonarQube in Azure Devops which is in Databricks. We were getting around 30 error. But
Question Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all exc
I have three DAGs (say, DAG1, DAG2 and DAG3). I have a monthly scheduler for DAG1. DAG2 and DAG3 must not be run directly (no scheduler for these) and must be r
I have table A id1 dt x1 2022-04-10 a2 2022-04-10 a1 2022-04-10 x1 2022-05-10 x2 2022-04-10 y2 2022-04-10 y1 2022-05-10 x1 2022-06 -10 Table B id1 dt a1 2022
I have a curious issue, when launching a databricks notebook from a caller notebook through dbutils.notebook.run (I am working in Azure Databricks). One intere
We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader databa
The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select
I've got a large NPI zipfile which includes a 7.3gb csv. (file can be located on NPI site here: http://download.cms.gov/nppes/NPI_Files.html -- the Full Replac
I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time
I've had, in the past, gdal utilities installed successfully on a Databricks Cluster running 20.04.3 LTS (focal). $ cat /etc/os-release NAME="Ubuntu" VERSION="2
I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want