I try to save the result as one "csv" file on Windows Server 2019. I'm using the "Microsoft.Spark" library. An empty folder is created with no "csv" file. The q
In my project i am using spark-Cassandra-connector to read the from Cassandra table and process it further into JavaRDD but i am facing issue while processing C
While executing pyspark code from a script. Getting following error while df.show(). from pyspark.sql.types import StructType,StructField, StringType, IntegerTy
I am trying to transform an entire df to a single vector column, using df_vec = vectorAssembler.transform(df.drop('col200')) I am being thrown this error: F
I have a Dataframe and wish to divide it into an equal number of rows. In other words, I want a list of dataframes where each one is a disjointed subset of the
Running Pyspark script getting the following error depending on which xml I query: cannot resolve 'explode(...)' due to data type mismatch The pyspark code: fr
I have spark 3.2, vertica 9.2. spark = SparkSession.builder.appName("Ukraine").master("local[*]")\ .config("spark.jars", '/home/shivamanand/spark-3.2.1-bin-hado
We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr
I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTa
I have two records of type RDD[T] For example: val a: RDD[Integer] = .... val b: RDD[Integer] = ... when I perform val z = a.union(b) println(z) I find the s
I am submitting a Spark Job using below command. I want to tail the yarn log using application Id similar to tail command operation in Linux box. export SPARK
I am going through Spark Programming guide that says: Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather th
I am having a pyspark dataframe as DOCTOR | PATIENT JOHN | SAM JOHN | PETER JOHN | ROBIN BEN | ROSE BEN | GRAY and need to concatenate patient n
I'm running both the master and 1 worker on a GPU server in standalone mode. After submitting the job, it retrieves and loses executors for X amount of times be
I have a large data set which I am reading from Postgres. It has an ID column, a timestamp column and several other columns which may have been updated. For eac
I have a column of type map, where the key and value changes. I am trying to extract the value and create a new column. Input: ----------------+ |symbols
I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ? select case when 1=1 then 1 else 0 end from table Tha
My database has numeric value, which is up to 256-bit unsigned integer. However, spark's decimalType has a limit of Decimal(38,18). When I try to do calculatio
I am getting two types of errors in running a job on Google dataproc and it is causing executors to be lost one by one until the last executor is lost and the j
Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS? If I use Spark standalone cluster manager and have my data distributed in HDFS c