Category "delta-lake"

Databricks- ConcurrentAppendException:

I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo

Spark binary file and Delta Table

I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa

How to find the number of Inserts and Updates of Merge command?

I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?

IoT - Databricks Deltalake - access in C# api or Node js API

I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and sto

How to handle memory issue while writing data in which a particular column contains very large data in each record in databricks in pyspark

I have a set of records with 10 columns. There is a column 'x' which contains an array of float values and the length of array can be very large(for eg, the len

How to manually checkpoint a delta table using PySpark?

I have a delta table, and am trying to append data to it and then checkpoint that table. By default I believe it checkpoints every 10 commits, but I would like

Delta Table / Athena And Spark

I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp

Auto increment id in delta table while inserting

I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not mat

Scala error - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

I have a requirement where i am reading data from a CSV file and writing data to a Delta table over scala on window OS. My scala code is given below:- import co

Error while using readstream from delta on azure data lake gen 2

I get the below error while reading data from delta lake. The detailed log on azure shows its failing to read .tmp file from the _delta_log folder. I have tried

What is the best way to cleanup and recreate databricks delta table?

I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu

How to Check Which Record is non-numeric in a String Column in Delta Table

I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S

Delete multiple rows from a delta table/pyspark data frame givien a list of IDs

I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a

Azure Syanpse Analytics

I have a need to connect to Synapse Analytics Serverless SQL Pool database using SQL Authentication. I created a serverless SQL Pool database and created a SQL

Update DeltaTable properties on S3

I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I

pyspark delta-lake metastore

Using "spark.sql.warehouse.dir" in the same jupyter session (no databricks) works. But after a kernel restart in jupyter the catalog db and tables arent't re

Create index for tables within Delta Lake

I'm new to the Delta Lake, but I want to create some indexes for fast retrieval for some tables in Delta Lake. Based on the docs, it shows that the closest is b

Spark Delta table restore to version

I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa

Databricks: Z-order vs partitionBy

I am learning Databricks and I have some questions about z-order and partitionBy. When I am reading about both functions it sounds pretty similar. Both function