Hi I am trying to run this code in but it is working fine in another EC2 Azkaban instance but not giving below error for another instance. private val adminprop
Leveraging the best from SnakeYAML & Jackson in scala, I am using the following method to parse YAML files. This method supports the usage of anchors in YAM
I want to understand how to go about implementing the following use-case using typeclasses in Scala (or find out if it is even possible). Given a sealed trait a
I have a query that upserts the data to the database via Slick. I'd like to return the ids of the entities that were inserted. How can I do this using Slick in
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.master", "local").getOrCreate() import spark.implicits._ case class Someth
I have the following code, which is used to (sha) hash columns in a spark dataframe: import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions
I am using Jupyter notebook for running Spark. My problem arises when I am trying to register a UDF from my custom imported jar. This is how I create th UDF in
In my project, I have the current workflow: Kafka message => Spark Streaming/processing => Insert/Update to HBase and/or Phoenix Both the Insert and Updat
I have a GenericRecord stream with value deserialised using Avro, schema has name and age. KafkaSource<GenericRecord> source = KafkaSource.<GenericRec
My project structure is: logs - data - pubs - invent.proto - common - num.proto NOTE - The .proto files are not under src/main/protobu
I am evaluating different load testing tools. After trying JMeter and having two exceptions when running and viewing the test result, I would like to give Gatli
I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in a
Spark: 3.0.0 Scala: 2.12.8 My data frame has a column with JSON string and I want to create a new column from it with the StructType. |temp_json_string
Every day I build another case class and wish I could define a property called type on it, but to do so requires using the highly annoying backtick syntax: dooh
I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?
I'm developing a kafka producer code in scala with those libs (I have to use version >6.X in kafka avro serializer to use TLS comunication): <dependency&g
I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below: Stack: dfExploded.write.cassandraFormat(table = "table", keyspa
I am trying to write a Spark DF to AWS Keyspaces. Randomly some of the records are getting updated and some of the records are throwing this exception com.datas
Is there a way of counting approximately after a group by on an sql dataset in Spark? Or more generally, what is the fastest way of group by counting in Spark?
I'm running spark 2.4.5 in my mac. When I execute spark-submit --version ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/