I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa
Short version: Need a faster/better way to update many column comments at once in spark/databricks. I have a pyspark notebook that can do this sequentially acro
I'm running like 20 notebooks concurrently and they all update the same Delta table (however, different rows). I'm getting the below exception if any two notebo
I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa
I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?
I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and sto
I have a set of records with 10 columns. There is a column 'x' which contains an array of float values and the length of array can be very large(for eg, the len
I have a delta table, and am trying to append data to it and then checkpoint that table. By default I believe it checkpoints every 10 commits, but I would like
I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp
I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not mat
I have a requirement where i am reading data from a CSV file and writing data to a Delta table over scala on window OS. My scala code is given below:- import co
I get the below error while reading data from delta lake. The detailed log on azure shows its failing to read .tmp file from the _delta_log folder. I have tried
I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu
I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S
I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a
I have a need to connect to Synapse Analytics Serverless SQL Pool database using SQL Authentication. I created a serverless SQL Pool database and created a SQL
I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I
Using "spark.sql.warehouse.dir" in the same jupyter session (no databricks) works. But after a kernel restart in jupyter the catalog db and tables arent't re
I'm new to the Delta Lake, but I want to create some indexes for fast retrieval for some tables in Delta Lake. Based on the docs, it shows that the closest is b
I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa