I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option
Below is the sample code snippet that is used for data fetch from HBase. This worked fine with Spark 3.1.2. However after upgrading to Spark 3.2.1, it is not wo
Postgres has a time data type. I am trying to insert rows into postgres from a glue job. Given the code: applymapping1 = ApplyMapping.apply(frame = SelectFromCo
If i read a file in pyspark: Data = spark.read(file.csv) Then for the life of the spark session, the ‘data’ is available in memory,correct? So if i
Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c
I have the following PySpark dataframe and I want to find percentile row-wise. value col_a col_b col_c row_a 5.0 0.0 11.0 row_b 3394.0 0
I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pa
I am on windows and I am trying to follow this doc to create spark applications with VSCode using a Synapse workspace. I can sign into Azure and set a default S
I am try to using pysparkling.ml.H2OMOJOModel for predict a spark dataframe using a MOJO model trained with h2o==3.32.0.2 in AWS Glue Jobs, how ever a got the e
We all know parquet is column-oriented so we can get only columns we desired and reduce IO. But what if the parquet file is stored in HDFS, should we download t
I'm getting this exception when trying to create a DataFrame with only 50 millions rows. Any ideas how to avoid this problem? [Exception] [JvmBridge] Array dime
In Spark 3.2, special datetime values such as epoch, today, yesterday, tomorrow, and now are supported in typed literals or in cast of foldable strings only, fo
Short version: Need a faster/better way to update many column comments at once in spark/databricks. I have a pyspark notebook that can do this sequentially acro
I am trying to add a new column to my Spark Dataframe. New column added will be of a size based on a variable (say salt) post which I will use that column to ex
I am stuck in a very odd situation related to Hbase design i would say. Hbase version >> Version 2.1.0-cdh6.2.1 So, the problem statement is, in Hbase, w
I have installed Spark 3.0.0 on a Windows 64 bit machine with Python 3.9.7 using an anaconda base environment. I'm trying to execute the next code in the pyspar
I am currently trying to get a flatten a data in databricks table. Since some of the columns are deeply nested and is of 'String' type, i couldn't use explode f
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.master", "local").getOrCreate() import spark.implicits._ case class Someth
I'm using databricks notebook to extract all field occurrences from a text column using the regexp_extract_all function. Here is the input: field_map#'IFDSIMP.7
I have the following code, which is used to (sha) hash columns in a spark dataframe: import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions