Category "scala"

Kafka Admin Client giving Timeout Error for ListTopic

Hi I am trying to run this code in but it is working fine in another EC2 Azkaban instance but not giving below error for another instance. private val adminprop

YAML Environment Variable Interpolation in SnakeYAML scala

Leveraging the best from SnakeYAML & Jackson in scala, I am using the following method to parse YAML files. This method supports the usage of anchors in YAM

How to implement subtype resolution of typeclass in scala

I want to understand how to go about implementing the following use-case using typeclasses in Scala (or find out if it is even possible). Given a sealed trait a

Return ids after upsert Slick

I have a query that upserts the data to the database via Slick. I'd like to return the ids of the entities that were inserted. How can I do this using Slick in

Why spark bucket number not equal to the number of files in the partition?

val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.master", "local").getOrCreate() import spark.implicits._ case class Someth

Scala Test: how to assert lenghty exception message securly and clean without hardcoding?

I have the following code, which is used to (sha) hash columns in a spark dataframe: import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions

Import custom udf from jar to Spark

I am using Jupyter notebook for running Spark. My problem arises when I am trying to register a UDF from my custom imported jar. This is how I create th UDF in

Receive a Kafka message through Spark Streaming and delete Phoenix/HBase data

In my project, I have the current workflow: Kafka message => Spark Streaming/processing => Insert/Update to HBase and/or Phoenix Both the Insert and Updat

Transform Stream

I have a GenericRecord stream with value deserialised using Avro, schema has name and age. KafkaSource<GenericRecord> source = KafkaSource.<GenericRec

ScalaPb sbt: Import "data/common/num.proto" was not found or had errors

My project structure is: logs - data - pubs - invent.proto - common - num.proto NOTE - The .proto files are not under src/main/protobu

How can one execute Java code using Gatling load testing framework?

I am evaluating different load testing tools. After trying JMeter and having two exceptions when running and viewing the test result, I would like to give Gatli

Scala spark UDF function that takes input and puts it in an Array

I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in a

spark how to convert a json string to a struct column without schema

Spark: 3.0.0 Scala: 2.12.8 My data frame has a column with JSON string and I want to create a new column from it with the StructType. |temp_json_string

Why can't the Scala compiler disambiguate a property called `type`?

Every day I build another case class and wish I could define a property called type on it, but to do so requires using the highly annoying backtick syntax: dooh

How to find the number of Inserts and Updates of Merge command?

I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?

Scala Kafka exception: NoSuchMethodError: org.apache.avro.Schema.toString

I'm developing a kafka producer code in scala with those libs (I have to use version >6.X in kafka avro serializer to use TLS comunication): <dependency&g

Error to write dataframe in Cassandra table on Amazon Keyspaces

I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below: Stack: dfExploded.write.cassandraFormat(table = "table", keyspa

CodecNotFoundException while writing to Amazon Keyspaces

I am trying to write a Spark DF to AWS Keyspaces. Randomly some of the records are getting updated and some of the records are throwing this exception com.datas

Spark/Scala approximate group by

Is there a way of counting approximately after a group by on an sql dataset in Spark? Or more generally, what is the fastest way of group by counting in Spark?

How to Install specific version of spark using specific version of scala

I'm running spark 2.4.5 in my mac. When I execute spark-submit --version ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/