I am forming a query in a String Builder like below : println(dataQuery) Execution started at 2019-10-31 02:58:24.006019 PST res245: String = " SELECT transac
I would like to modify my date column in spark df to subtract 1 month only if certain months appear. I.e. only if date is yyyy-07-31 or date is yyyy-04-30 chang
I am trying to remove a special character (å) from a column in a dataframe. My data looks like: ClientID,PatientID AR0001å,DH_HL704221157198295_9
I am trying to remove a special character (å) from a column in a dataframe. My data looks like: ClientID,PatientID AR0001å,DH_HL704221157198295_9
I am trying to create spark Dataframe from presto db table which has few columns as Array DataType. I tried multiple ways but I am getting same exception java.s
My question is - when should I do dataframe.cache() and when it's useful? Also, in my code should I cache the dataframes in the commented lines? Note: My datafr
I am trying to submit spark-submit but its failing with as weird message. Error: Could not find or load main class org.apache.spark.launcher.Main /opt/spark/b
Let's say I have a dataframe which looks like this: +--------------------+--------------------+--------------------------------------------------------------+
I am a newbie in pyspark, While trying to read parquet file through pyspark I get the below error. I have tried various things like reinstallation of jre and jd
I am beginner to Spark, while reading about Dataframe, I have found below two statements for dataframe very often- 1) DataFrame is untyped 2) DataFrame has sch
The error described below occurs when I run Spark job on Databricks the second time (the first less often). The sql query just performs create table as select
I'm using spark to deal with my data, like that: dataframe_mysql = spark.read.format('jdbc').options( url='jdbc:mysql://xxxxxxx',
I'm trying to compare two data frames with have same number of columns i.e. 4 columns with id as key column in both data frames df1 = spark.read.csv("/path/to/
So I am very new to pyspark but I am still unable to correctly create my own query. I try googling my problems but I just don't understand how most of this work
I have an excel file with damaged rows on the top (3 first rows) which needs to be skipped, I'm using spark-excel library to read the excel file, on their githu
I have a DeltaTable at aws S3 location (s3://bucket/myDeltaTable) which has a default table property delta.logRetentionDuration set to 30 days. Is there a way I
I am a novice to spark, and I want to transform below source dataframe (load from JSON file): +--+-----+-----+ |A |count|major| +--+-----+-----+ | a| 1| m
I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql.
For SparkSQL on hive, when I used named_struct in the query, it returns results: SELECT id, collect_set(emp_info) as employee_info FROM ( SELECT t.id,
I have: key value a [1,2,3] b [2,3,4] I want: key value1 value2 value3 a 1 2 3 b 2 3 4 It seems that in scala I can wr