How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?
I am trying to debug my spark UI, and in the SQL tab of spark UI getting this red mark on filter description, trying to figure out what does it mean. Spark UI s
I have a spark cluster in kubernetes based on image mcr.microsoft.com/mmlspark/spark2.4:v4. Spark version version 2.4.0 Using Scala version 2.11.12, OpenJDK
I wonder how this query is executing successfully. As we know 'having' clause execute before the select one then here how alias name used in 'select' statement
I'm trying to store the tweets from my kafka cluster into Elastic Search. Initially, I set the output format to be 'org.elasticsearch.spark.sql'. But , it creat
I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...
I have multiple JSON files (10 TB ~) on a S3 bucket, and I need to organize these files by a date element present in every json document. What I think that my c
This is a issue I am facing with Spark 3.0, worked before without even specifying a format. Now, I tried explicitly specifying the format, but it still doesn't
I am trying to find the best way to parse a json file with inconsistent schema (but the schema of the same type is known and consistent) in spark in order to sp
New to azure synapse, trying to create database (Managed table) from synapse notebook. I also added Storage blob data contributor for synapse workspace and spec
I'm running Spark standalone jobs in Windows. I would like to monitor my Spark jobs using the spark history server. I have launched spark history server with be
[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us
when I'm doing spark-submit using this command on Cloudera **time spark-submit \ --deploy-mode client \ --conf spark.app.name='XXXxxxxxx' --conf spark.master=l
I have a Spark Table, which contains 400+ millions records/rows. I used spark.table to convert it into a DF. The DF looks like this below id pub_date
Can anyone help how multimap_agg function in SQL and can be used in spark sql
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(
I downloaded succesfully this connector: com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 And when I try to load the information with this line: data = s
I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs
I'm trying to tokenize a 'string' column from a spark dataset. The spark dataframe is as follows: df: index ---> Integer question ---> String This is h
We want to implement SCD2 in Spark using SQL Join. i got reference from Github https://gist.github.com/rampage644/cc4659edd11d9a288c1b but it's not very cle