val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.master", "local").getOrCreate() import spark.implicits._ case class Someth
I have the following code, which is used to (sha) hash columns in a spark dataframe: import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions
I am using Jupyter notebook for running Spark. My problem arises when I am trying to register a UDF from my custom imported jar. This is how I create th UDF in
In my project, I have the current workflow: Kafka message => Spark Streaming/processing => Insert/Update to HBase and/or Phoenix Both the Insert and Updat
I have a GenericRecord stream with value deserialised using Avro, schema has name and age. KafkaSource<GenericRecord> source = KafkaSource.<GenericRec
My project structure is: logs - data - pubs - invent.proto - common - num.proto NOTE - The .proto files are not under src/main/protobu
I am evaluating different load testing tools. After trying JMeter and having two exceptions when running and viewing the test result, I would like to give Gatli
I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in a
Spark: 3.0.0 Scala: 2.12.8 My data frame has a column with JSON string and I want to create a new column from it with the StructType. |temp_json_string
Every day I build another case class and wish I could define a property called type on it, but to do so requires using the highly annoying backtick syntax: dooh
I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?
I'm developing a kafka producer code in scala with those libs (I have to use version >6.X in kafka avro serializer to use TLS comunication): <dependency&g
I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below: Stack: dfExploded.write.cassandraFormat(table = "table", keyspa
I am trying to write a Spark DF to AWS Keyspaces. Randomly some of the records are getting updated and some of the records are throwing this exception com.datas
Is there a way of counting approximately after a group by on an sql dataset in Spark? Or more generally, what is the fastest way of group by counting in Spark?
I'm running spark 2.4.5 in my mac. When I execute spark-submit --version ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/
When using the new Scala 3's flag -Yexplicit-nulls, every Java code which doesn't have explicit non-null annotations is treated as nullable, thus every Java met
in a Scala research application, i load a hocon file using PureConfig's ConfigSource.file() method, which represents the default configuration for a research ex
I have been trying to execute all my performance tests from my gatling fat-jar created with the assemble plugin, however, when I try to execute my performance t
I am using circe in scala and have a following requirement : Let's say I have some class like below and I want to avoid password field from being serialised the