Category "scala"

Import custom udf from jar to Spark

I am using Jupyter notebook for running Spark. My problem arises when I am trying to register a UDF from my custom imported jar. This is how I create th UDF in

Receive a Kafka message through Spark Streaming and delete Phoenix/HBase data

In my project, I have the current workflow: Kafka message => Spark Streaming/processing => Insert/Update to HBase and/or Phoenix Both the Insert and Updat

Transform Stream

I have a GenericRecord stream with value deserialised using Avro, schema has name and age. KafkaSource<GenericRecord> source = KafkaSource.<GenericRec

ScalaPb sbt: Import "data/common/num.proto" was not found or had errors

My project structure is: logs - data - pubs - invent.proto - common - num.proto NOTE - The .proto files are not under src/main/protobu

How can one execute Java code using Gatling load testing framework?

I am evaluating different load testing tools. After trying JMeter and having two exceptions when running and viewing the test result, I would like to give Gatli

Scala spark UDF function that takes input and puts it in an Array

I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in a

spark how to convert a json string to a struct column without schema

Spark: 3.0.0 Scala: 2.12.8 My data frame has a column with JSON string and I want to create a new column from it with the StructType. |temp_json_string

Why can't the Scala compiler disambiguate a property called `type`?

Every day I build another case class and wish I could define a property called type on it, but to do so requires using the highly annoying backtick syntax: dooh

How to find the number of Inserts and Updates of Merge command?

I have code similar to this in Spark(Scala). I would like to know the number of records this code updated/inserted when execute() is complete. Is there a way?

Scala Kafka exception: NoSuchMethodError: org.apache.avro.Schema.toString

I'm developing a kafka producer code in scala with those libs (I have to use version >6.X in kafka avro serializer to use TLS comunication): <dependency&g

Error to write dataframe in Cassandra table on Amazon Keyspaces

I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below: Stack: dfExploded.write.cassandraFormat(table = "table", keyspa

CodecNotFoundException while writing to Amazon Keyspaces

I am trying to write a Spark DF to AWS Keyspaces. Randomly some of the records are getting updated and some of the records are throwing this exception com.datas

Spark/Scala approximate group by

Is there a way of counting approximately after a group by on an sql dataset in Spark? Or more generally, what is the fastest way of group by counting in Spark?

How to Install specific version of spark using specific version of scala

I'm running spark 2.4.5 in my mac. When I execute spark-submit --version ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/

Scala 3 Explicit Nulls flag makes String operations quite unusable

When using the new Scala 3's flag -Yexplicit-nulls, every Java code which doesn't have explicit non-null annotations is treated as nullable, thus every Java met

standardized method for writing an arbitrary typesafe Config to a hocon file?

in a Scala research application, i load a hocon file using PureConfig's ConfigSource.file() method, which represents the default configuration for a research ex

How I can run all performance tests from fat jar with Gatling?

I have been trying to execute all my performance tests from my gatling fat-jar created with the assemble plugin, however, when I try to execute my performance t

How to ignore a field from serializing when using circe in scala

I am using circe in scala and have a following requirement : Let's say I have some class like below and I want to avoid password field from being serialised the

How did spark RDD map to Cassandra table?

I am new to Spark, and recently I saw a code is saving data in RDD format to Cassandra table. But I am not able to figure it out how it is doing the column mapp

Scala error - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

I have a requirement where i am reading data from a CSV file and writing data to a Delta table over scala on window OS. My scala code is given below:- import co

Category "scala"

Other Categories