I have batches of binary files (~3mb each) that I receive in batches of ~20000 files at a time. These files are used downstream for further processing, but I wa
I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?
I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and sto
I have a set of records with 10 columns. There is a column 'x' which contains an array of float values and the length of array can be very large(for eg, the len
I have a delta table, and am trying to append data to it and then checkpoint that table. By default I believe it checkpoints every 10 commits, but I would like
I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp
I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not mat
I have a requirement where i am reading data from a CSV file and writing data to a Delta table over scala on window OS. My scala code is given below:- import co
I get the below error while reading data from delta lake. The detailed log on azure shows its failing to read .tmp file from the _delta_log folder. I have tried
I am trying to cleanup and recreate databricks delta table for integration tests. I want to run the tests on devops agent so i am using JDBC (Simba driver) bu
I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S
I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a
I have a need to connect to Synapse Analytics Serverless SQL Pool database using SQL Authentication. I created a serverless SQL Pool database and created a SQL
I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I
Using "spark.sql.warehouse.dir" in the same jupyter session (no databricks) works. But after a kernel restart in jupyter the catalog db and tables arent't re
I'm new to the Delta Lake, but I want to create some indexes for fast retrieval for some tables in Delta Lake. Based on the docs, it shows that the closest is b
I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa
I am learning Databricks and I have some questions about z-order and partitionBy. When I am reading about both functions it sounds pretty similar. Both function