I wonder how this query is executing successfully. As we know 'having' clause execute before the select one then here how alias name used in 'select' statement
I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...
This is a issue I am facing with Spark 3.0, worked before without even specifying a format. Now, I tried explicitly specifying the format, but it still doesn't
[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us
when I'm doing spark-submit using this command on Cloudera **time spark-submit \ --deploy-mode client \ --conf spark.app.name='XXXxxxxxx' --conf spark.master=l
I have streaming data coming in as JSON array and I want flatten it out as a single row in a Spark dataframe using Python. Here is how the JSON data looks like
I am using spark 3.0.2 with java 8 version. I am trying to write data on s3 path using spark job. I am getting below exception, not able to know what caused thi
I am New in Structure Streaming Topic. so facing issue while calculating distinct count in column in Dataset/Dataframe. //DataFrame val readFromKafka = sparks
In python or R, there are ways to slice DataFrame using index. For example, in pandas: df.iloc[5:10,:] Is there a similar way in pyspark to slice data bas
I am using Spark in Horton works, when i execute the below code i am getting exception. i also have a separate spark instance running in my system - same code i
I am running a huge text file using PyCharm and PySpark. This is what I am trying to do: spark_home = os.environ.get('SPARK_HOME', None) os.environ["SPARK_HOM
I have a web service built around Spark that, based on a JSON request, builds a series of dataframe/dataset operations. These operations involve multiple joins,
i need some help please i have this dataframe with an even number of values for the column 'b' df1 = spark.createDataFrame([ ('c',1), ('c',2), ('c',
I have a PySpark dataframe like this: +------+------+ | A| B| +------+------+ | 1| 2| | 1| 3| | 2| 3| | 2| 5| +------+--
If I have a PySpark DataFrame with two columns, text and subtext, where subtext is guaranteed to occur somewhere within text. How would I calculate the positio
I am forming a query in a String Builder like below : println(dataQuery) Execution started at 2019-10-31 02:58:24.006019 PST res245: String = " SELECT transac
I would like to modify my date column in spark df to subtract 1 month only if certain months appear. I.e. only if date is yyyy-07-31 or date is yyyy-04-30 chang
I am trying to remove a special character (å) from a column in a dataframe. My data looks like: ClientID,PatientID AR0001å,DH_HL704221157198295_9
I am trying to remove a special character (å) from a column in a dataframe. My data looks like: ClientID,PatientID AR0001å,DH_HL704221157198295_9
I am trying to create spark Dataframe from presto db table which has few columns as Array DataType. I tried multiple ways but I am getting same exception java.s